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Foreword 

(This foreword is not part of this standard) 

This Enhanced Digital Access Communications System (EDACS™) IMBE Vocoder description describes 
the vocoder for land mobile radios meeting EDACS requirements. 

As a group, the family of Enhanced Digital Access Communications Systems documents describe the 
system, inclusive of the equipment requirements which allow both compatibility and interoperability 
between various systems and elements. The family of documents will be backward compatible and 
interoperable with existing installed Enhanced Digital Access Communications Systems as further defined 
within this family of documents. 

This document has been developed with inputs from the Vocoders subcommittee (TR-8.4), TIA Industry 
Members, and the International EDACS User's Group, under the sponsorship of TIA. 

For information on specific implementations, as they are developed, the reader is referred to the EDACS 
System and Shell Standard, originally published as TSB69 for; An EDACS Overview, a Glossary, and a 
Statement of Requirements. 

The reader's attention is called to the possibility that compliance with this Standard may require the use of 
one or more inventions covered by patent rights. 

By publication of this Interim Standard, no position is taken with respect to the validity of those claims or 
any patent rights in connection therewith. The patent holders so far identified have, however, filed 
statements of willingness to grant licenses under those rights on reasonable and nondiscriminatory terms 
and conditions to applicants desiring to obtain such licenses. Details may be obtained from the publisher. 


Jim Holthaus 
Chairman TR-8.4 
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i Scope 

This document specifies a voice coding method for the Enhanced Digital Access Communication System. 
It describes the functional requirements for the transmission and reception of voice information using digital 
communication media described in the standard. This document is specifically intended to define the 
conversion of voice from an analog representation to a digital representation that consists of a net bit rate of 
4.4 kbps for voice information, and a gross bit rate of 7.1 kbps after error control coding. 

The voice coder (or vocoder) presented in this document is intended to be used through-out a system in any 
equipment that requires an analog-to-digital or digital-to-analog voice interface. Specifically, mobile and 
portable radios as well as console equipment and gateways to voice networks may contain the vocoder 
described in this document. 


2 Introduction 

This document provides a functional description of the Improved Multi-Band Excitation (IMBE) voice 
coding algorithm adopted for Enhanced Digital Access Communications Systems. This document describes 
the essential operations that are necessary and sufficient to implement this voice coding algorithm. 
However, it is highly recommended that the references be studied prior to the implementation of this 
algorithm. It is also recommended that implementations begin with a high-level language simulation of the 
algorithm, and then proceed to a real-time implementation using a digital signal processor. High 
performance real-time implementations have been demonstrated using both floating-point and fixed-point 
processors. The reader is cautioned that this document does not attempt to describe the most efficient means 
of implementing the IMBE vocoder. The reader should consult one or more references on efficient real-time 
programming for more information on this subject. Additionally this document does not address vocoder 
testing and verification. These subjects will be addressed in separate documents that may be released at a 
later time. 
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Figure 1: Improved Multi-Band Excitation Speech Coder 


The IMBE speech coder is based on a robust speech model which is referred to as the Multi-Band 
Excitation (MBE) speech model [3]. The basic methodology of the coder is to divide a digital speech input 
signal into overlapping speech segments (or frames) using a window such as a Kaiser window. Each speech 
frame is then compared with the underlying speech model, and a set of model parameters are estimated for 
that particular frame. The encoder quantizes these model parameters and transmits a bit stream at 7.1 kbps. 
The decoder receives this bit stream, reconstructs the model parameters, and uses these model parameters to 
generate a synthetic speech signal. This synthesized speech signal is the output of the IMBE speech coder as 
shown in Figure 1. One should note that the IMBE speech coder shown in this figure and defined by this 
document is a digital-to-digital function. 

The IMBE speech coder is a model-based speech coder, or vocoder, which does not try to reproduce the 
input speech signal on a sample by sample basis. Instead the IMBE speech coder constructs a synthetic 
speech signal which contains the same perceptual information as the original speech signal. Many previous 
vocoders (such as LPC vocoders, homomorphic vocoders, and channel vocoders) have not been successful 
in producing high quality synthetic speech. The IMBE speech coder has two primary advantages over these 
vocoders. First, the IMBE speech coder is based on the MBE speech model which is a more robust model 
than the traditional speech models used in previous vocoders. Second, the IMBE speech coder uses more 
sophisticated algorithms to estimate the speech model parameters, and to synthesize the speech signal from 
these model parameters. 

This document is organized as follows. In Section 3 the MBE speech model is briefly reviewed. This 
section presents background material which is useful in understanding operation of the IMBE speech coder. 
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Section 4 describes the basic speech input/output requirements. Section 5 examines the methods used to 
estimate the speech model parameters, and Section 6 examines the quantization and reconstruction of the 
MBE model parameters. The error correction and the format of the 7.1 kbps bit stream is discussed in 
Section 7. This is followed by Section 8 which describes the enhancement of the spectral amplitudes, and 
Section 9 which describes the adaptive smoothing method used to reduce the effect of uncorrectable bit 
errors. Section 10 then demonstrates the encoding of a typical set of model parameters. Section 11 discusses 
the synthesis of speech from the MBE model parameters. A few additional comments on the algorithm and 
this document are provided in Section 12. Other information such as bit allocation tables, quantization 
levels and initialization vectors are contained in the attached appendices. In addition. Appendix K contains 
a set of flow charts describing certain elements of this vocoder. Note that these flow charts have been 
designed to help clarify the various algorithmic steps and do not necessarily describe the best or most 
efficient method of implementing the vocoder. 


3 Multi-Band Excitation Speech Model 

Let s(n) denote a discrete speech signal obtained by sampling an analog speech signal. In order to focus 
attention on a short segment of speech over which the model parameters are assumed to be constant, a 

window w(n) is applied to the speech signal s(n). The windowed speech signal s w{ n ) j s defined by 

s w (n) = s(n)w(n) (1) 


The sequence s w{ n ) is referred to as a speech segment or a speech frame. The IMBE analysis algorithm 

actually uses two different windows, WR ( n ) and w l( n ), each of which is applied separately to the speech 
signal via Equation (1). This will be explained in more detail in Section 5 of this document. The speech 

S ( 77, j 

signal s(n) is shifted in time to select any desired segment. For notational convenience w v > refers to the 
current speech frame. The next speech frame is obtained by shifting s(n) by 20 ms. 

A speech segment Sw ( n ) j s modelled as the response of a linear filter ( n ) to some excitation 
signal ew ( n ). Therefore, ( w ), the Fourier Transform of Sw ^ n \ can be expressed as 

S w (u) = H w (u)E w (u) (2) 


where an d E w (u>) 


are the Fourier Transforms of 


h w (n) 


and e w ( n ), respectively. 
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In traditional speech models, speech is divided into two classes depending upon the nature of 
excitation signal. For voiced speech the excitation signal is a periodic impulse sequence, where the distance 

between impulses is the pitch period -^0. For unvoiced speech the excitation signal is a white noise 
sequence. One of the primary distinctions between traditional vocoders is the method in which they model 

the linear filter ^( n ). The frequency response of this filter is generally referred to as the spectral 
envelope of the speech signal. In a LPC vocoder, for example, the spectral envelope is modeled with a low 
order all-pole model. Similarly, in a homomorphic vocoder, the spectral envelope is modeled with a small 
number of cepstral coefficients. 

A primary difference between traditional speech models and the MBE speech model is the 
excitation signal. In conventional speech models a single voiced/unvoiced (V/UV) decision is used for each 
speech segment. In contrast the MBE speech model divides the excitation spectmm into a number of non¬ 
overlapping frequency bands and makes a V/UV decision for each frequency band. This allows the 
excitation signal for a particular speech segment to be a mixture of periodic (voiced) energy and noise-like 
(unvoiced) energy. This added degree of freedom in the modelling of the excitation allows the MBE speech 
model to generate higher quality speech than conventional speech models. In addition it allows the MBE 
speech model to be robust to the presence of background noise. 

In the MBE speech model the excitation spectrum is obtained from the pitch period (or the 
fundamental frequency) and the V/UV decisions. A periodic spectrum is used in the frequency bands 
declared voiced, while a random noise spectrum is used in the frequency bands declared unvoiced. The 
periodic spectrum is generated from a windowed periodic impulse train which is completely determined by 
the window and the pitch period. The random noise spectrum is generated from a windowed random noise 
sequence. 


Voiced Speech 
Spectrum 


Unvoiced Speech Speech Spectrum 

Spectrum 



Traditional MBE Speech Model 

Speech 




Figure 2: Comparison of Traditional and MBE Speech Models 


A comparison of a traditional speech model and the MBE speech model is shown in Figure 2. In 
this example the traditional model has classified the speech segment as voiced, and consequently the 


4 























































TIA/EIA/IS-69.5 


traditional speech model is composed completely of periodic energy. The MBE model has divided the 
spectrum into 10 frequency bands in this example. The fourth, fifth, ninth and tenth bands have been 
declared unvoiced while the remaining bands have been declared voiced. The excitation in the MBE model 
is comprised of periodic energy only in the frequency bands declared voiced, while the remaining bands are 
comprised of noise-like energy. This example shows an important feature of the MBE speech model. 
Namely, the V/UV determination is performed such that frequency bands where the ratio of periodic energy 
to noise-like energy is high are declared voiced, while frequency bands where this ratio is low are declared 
unvoiced. The details of this procedure are discussed in Section 5.2. 


4 Speech Input/Output Requirements 


This section presents a number of performance recommendations for the analog front end of a voice codec, 
including the gain, filtering, and conversion elements as depicted in Figure 3. The objective is to establish a 
set of input/output requirements that will ensure that the voice codec operates at its maximum capability. 
The reader should note that Figure 3 shows four reference points (analog input, analog output, digital input 
and digital output) which are used in this document and will be used in future documents describing the test 
and verification procedure used with this vocoder. 



IMBE 


Analog Speech 


IMBE 

Decoder 


Digital Speech 
(8 kHz sampling) 


Fig. 3: Analog Front End 


The voice encoder and decoder defined in the remainder of this document operates with unity (i.e. 
0 dB) gain. Consequently the analog input and output gain elements shown in Figure 3 are only used to 
match the sensitivity of the microphone and speaker with the A-to-D converters and D-to-A converters, 
respectively. It is recommended that the analog input be set such that the RMS speech level under nominal 
input conditions is 25 dB below the saturation point of the A-to-D converter. This level (-22 dBmO) is 
designed to provide sufficient margin to prevent the peaks of the speech waveform from being clipped by 
the A-to-D converter. 
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The voice coder defined in this document requires the A-to-D and D-to-A converters to operate at 
an 8 kHz sampling rate (i.e. a sampling period of 125 microseconds) at the digital input/output reference 
points. This requirement necessitates the use of analog filters at both the input and output to eliminate any 
frequency components above the Nyquist frequency (4 kHz). The recommended input and output filter 
masks are shown in Figure 4. For proper operation, the frequency response of the analog filters should be 
bounded by the shaded zone depicted in this figure. 



Fig. 4: Analog Input/Output Filter Mask 

This vocoder description assumes that the A-to-D converter produces digital speech which is 
confined to the range [-32768, 32767], and similarly that the D-to-A converter accepts digital speech within 
this same range. If a converter is used which does not meet these assumptions then the digital gain elements 
shown in Figure 1 should be adjusted appropriately. Note that these assumptions are automatically satisfied 
if 16 bit linear A-to-D and D-to A converters are used, in which case the digital gain elements should be set 
to unity gain. Also note that the vocoder requires that any companding which is applied by the A-to-D 
converter (i.e. alaw or ulaw) should be removed prior to speech encoding. Similarly any companding used 
by the D-to-A converter must be applied after speech decoding. 
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Fig. 5: IMBE Speech Analysis Algorithm 


5 Speech Analysis 

This section presents the methods used to estimate the MBE speech model parameters. To develop a high 
quality vocoder it is essential that robust and accurate algorithms are used to estimate the model parameters. 
The approach which is presented here differs from conventional approaches in a fundamental way. 
Typically algorithms for the estimation of the excitation parameters and algorithms for the estimation of the 
spectral envelope parameters operate independently. These parameters are usually estimated based on some 
reasonable but heuristic criterion without explicit consideration of how close the synthesized speech will be 
to the original speech. This can result in a synthetic spectrum quite different from the original spectmm. In 
the approach used in the IMBE speech coder the excitation and spectral envelope parameters are estimated 
simultaneously, so that the synthesized spectrum is closest in a least squares sense to the original speech 
spectrum. This approach can be viewed as an “analysis-by-synthesis” method. The theoretical derivation 
and justification of this approach is presented in references [3, 4, 6], 

A block diagram of the analysis algorithm is shown in Figure 5. The MBE speech model 
parameters which must be estimated for each speech frame are the pitch period (or equivalently the 
fundamental frequency), the V/UV decisions, and the spectral amplitudes which characterize the spectral 
envelope. The organization of this section is as follows. First, the pitch estimation method is presented in 
Section 5.1. The V/UV determination is discussed in Section 5.2, and finally Section 5.3 discusses the 
estimation of the spectral amplitudes. 

The input to the speech analyzer, and, consequently, the encoder, is a discrete speech signal 
generated using an A-to-D converter as described in Section 4. This speech signal must first be digitally 
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filtered to remove any residual energy at D.C. This is accomplished by passing the input signal through a 
discrete high-pass filter with the following transfer function: 


H(z) 


1 - z - 1 
1 -.99z _1 


(3) 


The resulting high-pass filtered signal is denoted by s(n) throughout the remainder of this section. Figure 6 
shows the frequency response of the filter specified in equation (3) using the convention that the Nyquist 
frequency (4 kHz) is mapped to a discrete frequency of ?r radians. For more information on this frequency 
convention, which is used throughout this document, the reader is referred to reference [11], 



Fig. 6: High Pass Filter Frequency Response at 8 kHz Sampling Rate 


Previous Frames 
■< - 
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Present ; --->• 

Frame 
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Fig. 7: Relationship between Speech Fr am es 
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5.1 Pitch Estimation 

The objective in pitch estimation is to determine the pitch corresponding to the “current” speech frame 
s w ( n ) Pq j s related to the fundamental frequency ^0 by 


Po = 


27r 

ujo 


(4) 


where -^0 is measured in samples (at 8 kHz) and a ' , 0 is measured in radians. 

The pitch estimation algorithm attempts to preserve some continuity of the pitch between 
neighboring speech frames. A pitch tracking algorithm considers the pitch from previous and future frames, 
when determining the pitch of the current frame. Previous and future speech frames are obtained by shifting 
the speech signal in 160 sample (20 ms) time increments prior to the application of the window in Equation 
(1). The pitches corresponding to the two future speech frames are denoted by Pi and P 2 . Similarly, the 
pitch of the two previous speech frames are denoted by P-1 and P-2. These relationships are shown in 
Figure 7. 

The pitch is estimated using a two-step procedure. First an initial pitch estimate, denoted by Pi, is 
obtained. The initial pitch estimate is restricted to be a member of the set {21, 21.5, ... 121.5, 122}. It is 
then refined to obtain the final estimate of the fundamental frequency w 0, which has one-quarter-sample 
accuracy. This two-part procedure is used in part to reduce the computational complexity, and in part to 
improve the robustness of the pitch estimate. 

One important feature of the pitch estimation algorithm is that the initial pitch estimation algorithm 
uses a different window than the pitch refinement algorithm. The window used for initial pitch estimation, 

w l { n ), is 301 samples long and is given in Annex B. The window used for pitch refinement (and also for 

spectral amplitude estimation and V/UV determination), w R ( n ), is 221 samples long and is given in 
Annex C. Throughout this document the window functions are assumed to be equal to zero outside the 
range given in the Annexes. The center point of the two windows must coincide, therefore the first non-zero 

point of w R( n ) must begin 40 samples after the first non-zero point of w l( n ). This constraint is typically 
met by adopting the convention that w r( n ) anc i wj(n) — as s hown in Figure 8. 
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40 ms. 



Fig. 8: Window Alignment 


The amount of overlap between neighboring speech segments is a function of the window length. 
Specifically the overlap is equal to the window length minus the distance between frames (160 samples). 

Therefore the overlap when using w R^ ri ) j s equal to 61 samples and the overlap when using w l( n ) is 
equal to 141 samples. 


5.1.1 Determination of E(P) 


To obtain the initial pitch estimate an error function, E(P), is evaluated for every P in the set {21, 21.5, ... 
121.5, 122}. Pitch tracking is then used to compare the evaluations of E(P), and the best candidate from this 

set is chosen as Pi. This procedure is shown in Figure 9. The function E(P) is defined by 


E(P) = 


, -o I 150 | 

Ej=-150 slpAiWiij) ~ P ■ r{n ■ P) 

[£j=°- 150 s LPF(i) u; /(i)][l — P ' £)=-150 w jU)\ 


(5) 


where w I\ n ) is normalized to meet the constraint 


150 

S w lti) = L0 


i=-150 


( 6 ) 
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This constraint is satisfied for w l\ n ) listed in Annex B. The function r(t) is defined for integer values of t 
by 

150 

r ( t ) = SLPF^wj^SLPpij+ t)w 2 j{j+ t) (7) 

j=- 150 

The function r(t) is evaluated at non-integer values of t through linear interpolation: 

r{t) = (1 + [t\ - t) • r(|ij) + (* - |tj) • r(L*l + 1) (8) 

where [x] is equal to the largest integer less than or equal to x (i.e. truncating values of x). The low-pass 
filtered speech signal is given by 

10 

SLPF{n) = s ( n - j) h LPF{j) ( 9 ) 

j=-io 

where ^ LPF( n ) j s the 21 point FIR filter given in Annex D. 

The theoretical justification for the error function E(P) is presented in [3, 6], The initial pitch 

estimate Pi is chosen such that E{Pi) j s sma n ; however. Pi is not chosen simply to minimize E(P). 
Instead pitch tracking must be used to account for pitch continuity between neighboring speech frames. 

5.1.2 Pitch Tracking 

Pitch tracking is used to improve the pitch estimate by attempting to limit the pitch deviation between 
consecutive frames. If the pitch estimate is chosen to strictly minimize E(P), then the pitch estimate may 
change abruptly between succeeding frames. This abrupt change in the pitch can cause degradation in the 
synthesized speech. In addition, pitch typically changes slowly; therefore, the pitch estimates from 
neighboring frames can aid in estimating the pitch of the current frames. 



Fig. 9; Initial Pitch Estimation 

For each speech frame two different pitch estimates are computed. The first, Pb, is a backward 


estimate which maintains pitch continuity with previous speech frames. The second, Pf, is a forward 
estimate which maintains pitch continuity with future speech frames. The backward pitch estimate is 
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calculated with the look-back pitch tracking algorithm, while the forward pitch estimate is calculated with 
the look-ahead pitch tracking algorithm. These two estimates are compared with a set of decision rules 

defined below, and either the backward or forward estimate is chosen as the initial pitch estimate, Pi. 


5.1.3 Look-Back Pitch Tracking 

Let P—l and P-2 denote the initial pitch estimates which are calculated during the analysis of the previous 
two speech frames. Let P—l ( P) and P-2 (P) denote the error functions of Equation (5) obtained from the 
analysis of these previous two frames as shown in Figure 7. Then P-l(P-l) and P-2(P-2) will have 
some specific values. Upon initialization the error functions E-l(P) and P-2 ( P ) are assumed to be equal 

to zero, and P— l and P-2 are assumed to be equal to 100. 

Since pitch continuity with previous frames is desired, the pitch for the current speech frame is 

considered in a range near P—1. First, the error function E(P) is evaluated at each value of P which satisfies 
constraints (10) and (11). 

■8P_i < P < 1.2 P-i (10) 

P 6 {21,21.5, ...121.5,122} (11) 


These values of E(P) are compared and Pb is defined as the value of P which satisfies these constraints and 
which minimizes E(P). The backward cumulative error PPb{Pb) j s then computed using the following 


formula: 


CE b {P b ) = E(Pb) + E-i(P-i) + £7-2 (P_ 2 ) 


( 12 ) 


The backward cumulative error provides a confidence measure for the backward pitch estimate. It is 
compared against the forward cumulative error using a set of heuristics defined in Section 5.1.4. This 
comparison determines whether the forward pitch estimate or the backward pitch estimate is selected as the 
initial pitch estimate for the current frame. 


12 



TIA/EIA/IS-69.5 


5.1.4 Look-Ahead Pitch Tracking 

Look-ahead tracking attempts to preserve pitch continuity between future speech frames. Let Ei(P ) and 

E'ziP) denote the error functions of Equation (5) obtained from the two future speech frames as shown in 
Figure 7. Since the pitch has not been determined for these future frames, the look-ahead pitch tracking 
algorithm must select the pitch of these future frames. This is done in the following manner. First, Po is 

assumed to be fixed. Then the Pi and P'2 are found which jointly minimize E\{P\) + E 2 (P 2 )^ subject to 
constraints (13) through (16). 

Pi € {21,21.5, ...121.5,122} (13) 

-8-P 0 <Pi < 1.2-Po (14) 

P 2 € {21,21.5, ...121.5,122} (15) 

•8 Pi<P 2 < 1.2 • P 1 (16) 

The values of Pi and P 2 which jointly minimize El (Pi) ^2 (P 2 ) subject to these constraints are 

denoted by Pi and P 2 , respectively. Once Pi and P 2 have been computed the forward cumulative error 
function EEf(Pq) j s computed according to: 


CE F (Po) = E(P 0 ) + E^Px) + E 2 (P 2 ) (17) 

This process is repeated for each Po in the set {21, 21.5, ... 121.5, 122}. The corresponding values of 

CEf(P q ) are com p arec [ anc j Po is defined as the value of Po in this set which results in the minimum 

value of EEp{Po)_ i\ T 0 t e that references [3, 6] should be consulted for more information on the theory and 
implementation of the look-ahead pitch tracking algorithm. 

a £ En Eo. ik 

Once 0 has been found, the integer sub-multiples of 0 (j. e . 2 - 3 ’ n ) must be 

considered. Every sub-multiple which is greater than or equal to 21 is computed and replaced with the 
closest member of the set {21, 21.5, ... 121.5, 122} (where closeness is measured with mean-square error). 
Sub-multiples which are less than 21 are disregarded. 

The smallest of these sub-multiples is checked against constraints (18), (19) and (20). If this sub¬ 
multiple satisfies any of these constraints then it is selected as the forward pitch estimate, Pf. Otherwise 
the next largest sub-multiple is checked against these constraints, and it is selected as the forward pitch 
estimate if it satisfies any of these constraints. This process continues until all pitch sub-multiples have been 

tested against these constraints. If no pitch sub-multiple satisfies any of these constraints then Pf = Po. 
Note that this procedure will always select the smallest sub-multiple which satisfies any of these constraints 
as the forward pitch estimate. 
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CE f { — ) < .85 and 

n 

CE F {^) 

” <1.7 

CEf(Fo) “ 

(18) 

CE f { — ) < .4 and 

n 

cb f (A) 

ce f (P 0 ) ~ 

(19) 

CE f (~) < 

n 

.05 

(20) 


Once the forward pitch estimate and the backward pitch estimate have both been computed the 
forward cumulative error and the backward cumulative error are compared. Depending on the result of this 

comparison either Pf or Pb will be selected as the initial pitch estimate Pi. The following set of decision 
rules is used to select the initial pitch estimate from among these two candidates: 

If 

CEb{Pb) < -48, then Pj = Pb (21) 



Fig. 10: Pitch Refinement 


Else if 

CEb{Pb) < CEf{P f ), then Pj = P B (22) 


Then 

Pl = P F 


(23) 


The flow charts in Annex K should be examined for more information on initial pitch estimation. 

Note that the initial pitch estimate. Pi, is a member of the set {21, 21.5, ... 121.5, 122}, and therefore it has 
half-sample accuracy. 
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5.1.5 Pitch Refinement 


The pitch refinement algorithm improves the resolution of the pitch estimate from one half sample to one 
quarter sample. Ten candidate pitches are formed from the initial pitch estimate. These are 


Pl-lPl 


Pr + l 


and 



These candidates are converted to their equivalent 


fundamental frequency using Equation (4). The error function E R (ui o), defined in Equation (24), is 
evaluated for each candidate fundamental frequency The candidate fundamental frequency which 


results in the minimum value of E r (u)q) is selected as the refined fundamental frequency <^0. A block 
diagram of this process is shown in Figure 10. 

LL^-5Jf^oJ 

E R {to o) = ^ ) \S w {m) — S w (tti,uj o)| (24) 

m— 50 


The synthetic spectrum 


S w {m,LO 0 ) 


is given by. 


Ao{uo)W R {64m) for [o 0 ] <m < [6 0 ] 

^i(wo)W B (L64m - + .5J) for [ail <m< [&i] 

S w {m.uJo) - < 


(25) 


Mu 0 )W R ([6Am - + -5j) for fail <m< \b t ] 


where a l■ hi and Al are defined in equations (26) through (28), respectively. The notation [x] denotes the 
smallest integer greater than or equal to x. 


256.. ^ 

a; = —{l - .5)u>0 


Ai(u o) = 


t 256 

bi = —— (l + .5)uto 

Z7T 

ElSfali w^i 64 ” 1 ~ 1J ^ luJo + - 5J) 

\W R mm ~ o + -5J)| 2 


(26) 

(27) 

(28) 
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The function S w {m) re f ers t0 the 256 point Discrete Fourier Transform of s ( n ) -u; R( n ), and WR(m) 

refers to the 16384 point Discrete Fourier Transform of w Ri n ). These relationships are expressed below. 
Reference [11] should be consulted for more information on the DFT. 

110 

S w (m) = ^2 s(n)wR{n)e~ j ^ L for -127 < m < 128 (29) 

n= —110 
HO 

Wr{tti) — ^2 WR(n)e~^^^ for —8191 < m < 8192 (30) 

n= —110 

The notation refers to the complex conjugate of Wr( tti ) . However, since w n( n ) is a real 

symmetric sequence, ^R( m ) = ^R{ m ). 


COD 

S w (m) 



v k 

1<k<K 


COq 


Fig. 11: IMBE Voiced/Unvoiced Determination 


Once the refined fundamental frequency has been selected from among the ten candidates, it is 
used to compute the number of harmonics in the current segment, L, according to the relationship: 


7T 


L = [.9254* L— + .25JJ 

UtQ 


(31) 


Due to the limits on ^0, equation (37) confines L to the range 9 < L < 56. Once this equation has been 

computed, the parameters a l and for 1 < 1 < L are computed from ^0 according to equations (32) and 
(33), respectively. 


ai = — (Z ~ -5)wo 

1 256 „ 

b [ — — (/ + .5)cjo 

Z7T 


(32) 

(33) 
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5.2 Voiced/Unvoiced Determination 

The voiced/unvoiced (V/UV) decisions, v k for 1 < k < AT, are found by dividing the spectrum into K 
frequency bands and evaluating a voicing measure, -^fc, for each band. The number of frequency bands i 
function of L and is given by: 


is a 


K 


L%^J if L < 36 


12 


otherwise 


The voicing measure for 1 < k < K- l is given by 


D k = 


l 5 ^ m ) ~ gu>(m,a>o)| 2 

Ei\v 3 ;_ 2l i5„(m)p 


(34) 


(35) 


where ^0 is the refined fundamental frequency, and Sw( rn )- ! and S w (m, u>q) are defined in section 

5.1.5. Similarly, the voicing measure for the highest frequency band is given by 


°k = 


yfol-i 


|5u,(m) - S w (m, ujo)\" 


2l I5»(-)I 2 


(36) 


The parameters D k f or 1 <k<K are compared with a threshold function (^’ ^o) given by: 

0 if E(Pj) > .5 and k > 2 

0 € (fc,t2> 0 ) = < .5625 [1.0 - .3096(fc - l)w 0 ] • M(f) else if u fc (-l) = 1 (37) 

.45 [1.0 — ,3096(A: — 1)<jo] • M(£) otherwise 

where M('C) is an energy dependent function which is computed from a set of local energy parameters and 
v fc(-l) is the value of the k'th V/UV decision for the previous frame. Evaluation of this threshold function 
requires the parameters ^LF > £.HF ■ and £o to be computed for the current segment in the following manner, 
where the value ^fi(O) j s the found by evaluating equation (30) at m = 0. 


Zlf 

|5„,(m)| 2 

. iol^WI 2 

(38) 

£hf 

|5^(m)| 2 

(39) 

£o 

= £lf + £hf 

(40) 
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These parameters are then used to update the parameter Cmax according to the rules presented below. 
Throughout this section the notation Cmax (0) or Cmax is used to refer to the value of the parameter in the 
current frame, while the notation Cmax (-1) is used to refer to the value of the parameter in the previous 
frame. 


■5$ max ( l) + .5£o if £o > £ max ( 1 ) 

Uax(0) = .99£ma*(-l) + -OHo else if .99 £ mai (-l) + .01 eo > 20000 (41) 


20000 


otherwise 


The completed set of energy parameters for the current frame is used to calculate the function M(0 


shown below. 


M(o = ! [■ 


0025 g TO ax+f0 > 

•01 ^mai+^0 j 


if iLF > 5 Z'HF 


( fe)' 8 


L<36, 3K-2<L<3K 



Fig. 12: IMBE Frequency Band Structure 


This function is then used in Equation (37) to calculate the V/UV threshold function. If &k is less than the 

threshold function then the frequency band a Zk -2 < to < ^3 k is declared voiced; otherwise this frequency 
band is declared unvoiced. A block diagram of this procedure is shown in Figure 11. The adopted 

convention is that if the frequency band ®3k-2 < co< ^3 k is declared voiced, then = 1. Alternatively, if 
the frequency band a 3k-2 < co < ^3 k is declared unvoiced, then f’fc = 0. 
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With the exception of the highest frequency band, the width of each frequency band is equal to 
Therefore all but the highest frequency band contain three harmonics of the refined fundamental 
frequency. The highest frequency band (as defined by Equation (34)) may contain more or less than three 
harmonics of the fundamental frequency. If a particular frequency band is declared voiced, then all of the 
harmonics within that band are defined to be voiced harmonics. Similarly, if a particular frequency band is 
declared unvoiced, then all of the harmonics within that band are defined to be unvoiced harmonics. 

S w (m) 

wo 

S„(m) 

Fig. 13: IMBE Spectral Amplitude Estimation 

5.3 Estimation of the Spectral Amplitudes 



l<k<K 


Once the V/UV decisions have been determined the spectral envelope can be estimated as shown in Figure 
13. In the IMBE speech coder the spectral envelope in the frequency band a lk-2 < co < ^3/c is specified by 
3 spectral amplitudes, which are denoted by ^Zk-2- M^k-i ■ and -^3 k. The relationship between the 
frequency bands and the spectral amplitudes is shown in Figure 12. If the frequency band a Zk-2 < 0 ) < ^3 k 
is declared voiced, then ^3k-2- ^3k -t ■ an d M$k are estimated by, 

gSiall IS-Wf 

for / in the range 3k - 2 <1 <3k. Alternatively, if the frequency band a 3k - 2 < co < ^3/c is declared unvoiced, 
then 2i ^Hk-l- and ^Zk are estimated according to: 

1 1 i^wi 2 ] 1 

Hn=-no {\bi] ~ ("a/]) 

for / in the range 3k - 2 <1 <3k. 
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This procedure must be modified slightly for the highest frequency band which covers the 
frequency interval a 3K-2 < oi < ^L. The spectral envelope in this frequency band is represented by 

L — 3 K + 3 spectral amplitudes, denoted ^zk-v • ■ ^L. If this frequency band is 

declared voiced then these spectral amplitudes are estimated using equation (43) for 3A" — 2 < / < L, 
Alternatively, if this frequency band is declared unvoiced then these spectral amplitudes are estimated using 

equation (44) for 3 A" — 2 < l < L, 

As described above, the spectral amplitudes Mi are estimated in the range 1 < / < L, where L is 
given in Equation (31). Note that the lowest frequency band, a l < < ^3, is specified by M\, M 2 , and 

M 3 , The D.C. spectral amplitude, Mq, j s ignored in the IMBE speech coder and can be assumed to be zero. 


Parameter 

Number of Bits 

Fundamental Frequency 

8 

Voiced/Unvoiced Decisions 

k 

Spectral Amplitudes 

79 -k 

Synchronization 

1 


Table 1: Bit Allocation Among Model Parameters 


6 Parameter Encoding and Decoding 

The analysis of each speech frame generates a set of model parameters consisting of the fundamental 

frequency, ^0, the V/UV decisions, for 1 < k < K, and the spectral amplitudes. Mi for 1 < / < L. 
Since the speech coder is designed to operate at 7.1 kbps with a 20 ms frame length, 142 bits per frame are 
available for encoding the model parameters. Of these 142 bits, 54 are reserved for error control as is 
discussed in Section 7 of this document, and the remaining 88 bits are divided among the model parameters 
as shown in Table 1. This section describes the manner in which these bits are used to quantize, encode, 
decode and reconstruct the model parameters. In Section 6.1 the encoding and decoding of the fundamental 
frequency is discussed, while Section 6.2 discusses the encoding and decoding of the V/UV decisions. 
Section 6.3 discusses the quantization and encoding of the spectral amplitudes, and Section 6.4 discusses 
the decoding and reconstruction of the spectral amplitudes. Reference [7] provides general information on 
many of the techniques used in this section. 
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6.1 Fundamental Frequency Encoding and Decoding 


The fundamental frequency is estimated with one-quarter sample resolution in the interval 


2tt 


123.125 


it < ^0 < 


2ir 

19.875; however, it is only encoded at half-sample resolution. This is accomplished by 


finding the value of bo which satisfies: 


~ 47T 

&0 = L— -39J 

u>0 


(45) 


value 

bits 

0 

0000 0000 

1 

0000 0001 

2 

0000 0010 

255 

mi mi 


Table 2: Eight Bit Binary Representation 


The quantizer value ^0 is represented with 8 bits using the unsigned binary representation shown in Table 
2. This representation is used throughout the IMBE speech coder to convert quantized values into a specific 
bit pattern. 

The fundamental frequency is decoded and reconstructed at the receiver by using Equation (46) to 

convert ^0 to the received fundamental frequency &0. In addition bo is used to calculate K and L, the 
number of V/UV decisions and the number of spectral amplitudes, respectively. Theses relationships are 
given in Equations (47) and (48). 


Air 

Uq ~ - - 

bo + 39.5 

(46) 

: L-9254[^ + .25JJ 

<jJ0 

(47) 

^(L+2)j if l < 36 

12 otherwise 

(48) 


Since K and L control subsequent bit allocation by the receiver, it is important that they equal K 
and L, respectively. This occurs if there are no uncorrectable bit errors in the six most significant bits 
(MSB) of bo. For this reason these six bits are well protected by the error correction scheme discussed in 
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Section 7. A block diagram of the fundamental frequency encoding and decoding process is shown in 
Figure 14. 

Since the pitch estimation algorithm described in Section 5.1 restricts the range of w 0 to 
123.125 — 0 — 19 . 875 , the value of computed according to Equation (45) is limited to the range 0 < 

A AAA 

f >0 < 207. The use of 8 bits to represent ^0 leaves 48 values of ^0 (i.e. 208 < bo < 255) which are outside 
the valid range of pitch values. These 48 values are reserved for future use. 



Fig. 14: Fundamental Frequency Encoding and Decoding 


6.2 Voiced/Unvoiced Decision Encoding and Decoding 

The V/UV decisions v k, for 1 < k < K, are binary values which classify each frequency band as either 

voiced or unvoiced. These values are encoded using the quantizer value &l is represented with K bits using 
an unsigned binary representation which is analogous to that shown in Table 2. 

K 

bi = Vk 2 K - k (49) 

fc=i 

At the receiver the K bits corresponding to b i are decoded into the V/UV decisions v l for 1 < / < 
L. Note that this is a departure from the V/UV convention used by the encoder, which used a single V/UV 
decision to represent an entire frequency band. Instead the decoder uses a separate V/UV decision for each 

spectral amplitude. The decoder performs this conversion by using b\ to determine which frequency bands 
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are voiced or unvoiced. The state of is then set depending upon whether the frequency w = l ' &0 is 
within a voiced or unvoiced frequency band. This can be expressed mathematically as shown in the 
following two equations. 


v k - 

1<k<K 


>- 


V/UV Decision 
Encoding 


>■ 


b, 


wJ 

V/UV Decision 



Decoding 



Fig. 15: V/UV Decision Encoding and Decoding 


Gain Vector 



Fig. 16: Encoding of the Spectral Amplitudes 




L^J if l < 36 
12 otherwise 


(50) 


vi 


bi 


-2 


k 


for 1 < l < L 


(51) 


Figure 15 shows a block diagram of the V/UV decision encoding and decoding process. 
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6.3 Spectral Amplitudes Encoding 

The spectral amplitudes for \ < I < L, are real values which must be quantized prior to encoding. This 
is accomplished as shown in Figure 16, by forming the specttal amplitude prediction residuals 2"} for 1 < / < 
L, according to Equations (52) through (57). The reader is referred to [6] for more information on this 
topic. 


For the purpose of this discussion L{ 0) or L refer to the number of harmonics in the current 
frame, while L(- 1) refers to the number of harmonics in the previous frame. Similarly, ^l(O) for 1 < / < L 

refers to the unquantized specUal amplitudes of the current frame, while ^l(- 1) for 1 < / < L refers to the 
quantized specUal amplitudes of the previous frame. 


k 


L(-l) 


■l 


m 

6i = k - [ki\ 


(52) 

(53) 


Ti = log 2 M[(0) - p(l - Si) log 2 M^ ;J (-l) 

- pSt log 2 M L £ jJ+1 (-l) 

. i(0) 


+ T777T SK 1 -^) lo S2^ L fc A j(-l)+ < 5 a log 2 M^ J+ 1 (-l)] ( 54 ) 


L(0)^ 1 ^ - 

The prediction coefficient, P, is adjusted each frame according to the following rule: 


P = S 


.4 if L(0) < 15 

.03L(0) - .05 if 15 < L( 0) < 24 
■ 7 otherwise 


(55) 


In order to form Tl using equations (52) through (55), the following assumptions are made: 

M 0 (-l) = 1.0 (56) 

Mi(-l) = M Z( _ 1} (-1) forOL(-l) (57) 

Also upon initialization -^i(-l) should be set equal to 1.0 for all l, and L(- T) = 30. 

The L prediction residuals are then divided into 6 blocks. The length of each block, denoted Ji 
for 1 < i < 6, is adjusted such that the following constraints are satisfied: 


T.J, = i 


(58) 


i =1 


< Ji < j , + 1 < r|i 


for 1 < i < 5 
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L=34 



Block 1 

£ ij 

Block 2 
£ 2,j 

Block 3 
£ 3,j 

Block 4 

^4,j 

Block 5 

£ 5,j 

Block 6 

£ 6,j 


I T 1 f 5 * 

^6 TlO^ 

^ T 11 T 16 

I 1 x* T 

a 17 l 22 

1 1 X T 

1 23 1 28 

11 x x 1 

1 29 134 

Length: 

Ji = 5 

Jz = 5 

J 3 = 6 

II 

•'T' 

J 5 = 6 

J 6 = 6 


Low Frequency -< 



— ► High Frequency 


Fig. 17: Prediction Residual Blocks for L = 34 
The table shown in Annex J lists the six block lengths for all possible values of L. The first or lowest 
frequency block is denoted by Cl J for 1 <j < J 1, and it consists of the first •A consecutive elements of Tl 
(i.e. 1 < / < Jl). The second block is denoted by C2 J for 1 < j < J'2, and it consists of the next *^2 
consecutive elements of J'l (j e . J\ + l < l < J 1 +J 2 ). This continues through the sixth or highest 
frequency block, which is denoted by c 6 J for 1 < j < J&. It consists of the last Je consecutive elements of 

F; (i.e. L + l — Jq< 1< Ly A n exam pi e 0 f this process is shown in Figure 17 for L = 34. 

Each of the six blocks is transformed using a Discrete Cosine Transform (DCT), which is 

discussed in [7], The length of the DCT for the i’th block is equal to Ji. The DCT coefficients are denoted 

by Ci t k t where 1 < i < 6 refers to the block number, and 1 <k <Ji refers to the particular coefficient within 
each block. The formula for the computation of these DCT coefficients is as follows: 

Ci,k = y E cos ^-y-s) ] for 1 < k < J, (60) 

Ji j —1 Ji 

The DCT coefficients from each of the six blocks are then divided into two groups. The first group consists 
of the first DCT coefficient from each of the six blocks. These coefficients are used to form a six element 

vector, Ri for 1 < i < 6, where Ri = The vector Ri is referred to as the gain vector, and its 

construction is shown in Figure 18. The quantization of the gain vector is discussed in section 6.3.1. 

The second group consists of the remaining higher order DCT coefficients. These coefficients 

correspond to where 1 < i < 6 and 2 <j < Ji. Note that if Ji = 1, then there are no higher order DCT 
coefficients in the i’th block. The quantization of the higher order DCT coefficients is discussed in section 
6.3.2. 
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L=34 


Gain Transformed 

Vector Gain Vector 


Block 1 

Ji = 5 


Block 2 

J 2 = 5 


Block 6 

J 6 = 6 



Fig. 18: Formation of Gain Vector 

One important feature of the spectral amplitude encoding algorithm, is that the spectral amplitude 
information is transmitted differentially. Specifically, a prediction residual is transmitted which measures 
the change in the spectral envelope between the current frame and the previous frame. In order for a 
differential scheme of this type to work properly, the encoder must simulate the operation of the decoder 
and use the reconstructed spectral amplitudes from the previous frame to predict the spectral amplitudes of 
the current frame. The IMBE spectral amplitude encoder simulates the spectral amplitude decoder by 

setting L = L and then reconstructing the spectral amplitudes as discussed above. This is shown as the 
feedback path in Figure 16. 


6.3.1 Encoding the Gain Vector 

The gain vector can be viewed as a coarse representation of the spectral envelope of the current segment of 

speech. The quantization of the gain vector begins with a six point DCT of Ri for 1 < / < 6 as shown in the 
following equation. 

^ 1 - r 7T (m — 1)(* — i) 

G m = - 22 Ri cos[---—] for 1 < m < 6 (61) 

i=l 

The resulting vector, denoted by G m for 1 < m < 6, is quantized in two parts. The first element, G ^ can be 
viewed as representing the overall gain or level of the speech segment. This element is quantized using the 6 

bit non-uniform quantizer given in Annex E. The 6 bit value ^2 is defined as the index of the quantizer 
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value (as shown in Annex E) which is nearest to Ci The remaining five elements of C m are quantized 

using uniform scalar quantizers where the five quantizer values &3 to b- are computed from the vector 
elements as shown in Equation (62). 


bm 


0 

< 2^-1 


|^ Gm-l j + 2Br, 


-1 


if L 
if L 


Am 

Gm-l j ^ 2^-1 
Am 


otherwise 


for 3 < m < 7 (62) 


The parameters B m an d A m j n Equation (62) are the number of bits and the step sizes used to quantize 
each element. These values are dependent upon L , which is the number of harmonics in the current frame. 
This dependence is tabulated in Annex F. Since L is known by the encoder, the correct values an d 


are first obtained using Annex F and then the quantizer values b m for 3 < m < 7 are computed using 
Equation (62). The final step is to convert each quantizer value into an unsigned binary representation using 
the same method as shown in Table 2. 


6.3.2 Encoding the Higher Order DCT Coefficients 


Once the gain vector has been quantized, the remaining bits are used to encode the L - 6 higher order DCT 
coefficients which complete the representation of the spectral amplitudes. Annex G shows the bit allocation 


as a function of L for these coefficients. For each value of L the L - 6 entries, labeled ^8 through ^£+1, 
provide the bit allocation for the higher order DCT coefficients. The adopted convention is that 


r^8 ; 69, 


b L+ 1 ] 


correspond to 


[Ci,2; 


Cl: 


c. 


1,7! ’ 


C6,2, C6,3, ■ • C 6 


'^6], 


respectively. 

Once the bit allocation for the higher order DCT coefficients has been obtained, these coefficients 
are quantized using uniform quantization. The step size used to quantize each coefficient must be computed 
from the bit allocation and the standard deviation of the DCT coefficient using Tables 3 and 4. For 
example, if 4 bits are allocated for a particular coefficient, then from Table 3 the step size. A, equals AOC. 
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Number of Bits 

Step Size 

1 

1.2(7 

2 

.85(7 

3 

.65(7 

4 

.40(7 

5 

,28a 

6 

.15(7 

7 

.08a 

8 

.04cr 

9 

.02(7 

10 

.01(7 


Table 3: Uniform Quantizer Step Size for Higher Order DCT Coefficients 
If this was the third DCT coefficient from any block (i.e. ^>3), then O = .241 as shown in Table 4. 
Performing this multiplication gives a step size of .0964. Once the bit allocation and the step sizes for the 

higher order DCT coefficients have been determined, then the bit encodings b m for 8 <m< L + 1 are 
computed according to Equation (63). 


— \ 


o 

2 ^" 

C, 


if 

Tr m 

if > 2® m_1 


for 8 < m < L + 1 


(63) 


Lx^J + 2- Bm 1 otherwise 


The parameters B m anc [ A m j n equation (63) refer to the quantizer value, the number of bits and the 


step size which has been computed for respectively. Note that the relationship between in, i, and k in 
Equation (63) is known and can be expressed as: 


2-1 

m = 6 + k + ^2 Jn (64) 

n=l 

Finally, each quantizer value is converted into the appropriate unsigned binary representation which is 
analogous to that shown in Table 2. 
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DCT Coefficient 

a 

Ci, 2 

.307 


.241 

Ci A 

.207 

Ci A 

.190 

Ci, 6 

.179 

Cij 

.173 

Ci, 8 

.165 

Ci, 9 

.170 

Ci, io 

.170 


Table 4: Standard Deviation of Higher Order DCT Coefficients 

Gain Vector 



M,(0) 

1<I<L 


Fig. 19: Decoding of the Spectral Amplitudes 


6.4 Spectral Amplitudes Decoding 


In order for the decoder to reconstruct the spectral amplitudes, the parameter L must first be computed 

from b 0 us i n g Equations (46) and (47). Then the spectral amplitudes can be decoded and reconstructed by 
inverting the quantization and encoding procedure described above. A block diagram of the spectral 
amplitude decoder is shown in Figure 19. 

The first step in the spectral amplitude reconstruction process is to divide the spectral amplitudes 
into six blocks. The length of each block, Ji for 1 < i < 6, is adjusted to meet the following constraints. 
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6 

E^' = 1 (65) 

2—1 

L§J <Ji< Jr +1 < rfl for 1 < t < 5 (66) 

C 1 

The elements of these blocks are denoted by where 1 < i < 6 denotes the block number and where 
1 < k < Ji denotes the element within that block. The first element of each block is then set equal to the 

decoded gain vector, R, via equation (67). The formation of the decoded gain vector is discussed in 
Section 6.4.1. 

Cij = Ri for 1 < i < 6 (67) 

The remaining elements of each block correspond to the decoded higher order DCT coefficients which are 
discussed in Section 6.4.2. 


6.4.1 Decoding the Gain Vector 


The gain is decoded in two parts. First the six bit value ^2 is used to decode the first element of the 
transformed gain vector, denoted by G\_ This is done by using the 6 bit value ^2 as an index into the 
quamtizer values listed in Annex E. Next the five quantizer values 5 2 through ^7 are used to reconstruct 
the remaining five elements of the transformed gain vector, denoted by G 2 through G§ This is done by 
using L, the number of harmonics in the current frame, in combination with the table in Annex F to 
establish the bit allocation and step size for each of these five elements. The relationship between the 
quantizer values and the transformed gain vector elements is expressed in Equation (68), 


G 


m —1 


0 if B m = 0 

A m (b m — 2^ m_1 + .5) otherwise 


for 3 < m < 7 (68) 


where -Am and Rrn are the step sizes and the number of bits found via Annex F. Once the transformed 
gain vector has been reconstructed in this manner, the gain vector R for 1 < i < 6 must be computed 
through an inverse DCT of Gm as shown in the following equations. 


R = E a ( m ) &r, 

m =1 


COS 


.7r(m — l)(i - |) 


for 1 < i < 6 


(69) 
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a(m) = 


1 if m = 1 

2 otherwise 


(70) 


6.4.2 Decoding the Higher Order DCT Coefficients 

The higher order DCT coefficients, which are denoted by Ci,k for 2 < i < 6 and 1 < k < Ji, are 
reconstructed from the quantizer values ^ 8 ’ ^ 9 ’ ^L+l. First the bit allocation table listed in Annex G 

is used to determine the appropriate bit allocation. The adopted convention is that ^ 8 ’ ^ 9 ’ ’ 

correspond to ^ 1 ’ 2 ’ '' ’’ ^' 6 ’ 2 ' ^ 6 > 3 ’ '', respectively. Once the 

bit allocation has been determined the step sizes for each ^i,k are computed using Tables 3 and 4. The 
determination of the bit allocation and the step sizes proceeds in the same manner as is discussed in Section 

6.3.2. Using the notation Bm and A m to denote the number of bits and the step size, respectively, then 
each higher order DCT coefficient can be reconstructed according to the following formula. 


Ci, k = 


0 if B m = 0 

A m (b m — 2^ rn ~ l + .5) otherwise 


for 8 < m < L + 1 


(71) 


where as in Equation (64), the following equation can be used to relate m, i, and k. 


m 


i-l 

= 6 + k + ^ ) J n 

n=l 


(72) 


Once the DCT coefficients have been reconstructed, an inverse DCT is computed on each of 
the six blocks to form the vectors J. This is done using the following equations for 1 < i < 6. 


r 7r (fc - l)(j - 1) 


= a (k)Ci* cos[ 


k =1 


Ji 


for 1 < j < J z 


a(k) = { 


1 if A: = 1 

2 otherwise 


(73) 


(74) 


The six transformed blocks Cl d are then joined to form a single vector of length L, which is denoted 
for 1 < / < L. The vector corresponds to the reconstructed spectral amplitude prediction residuals. The 
adopted convention is that the first elements of are equal to Cl J for 1 < j < -h. The next ^2 
elements of Ti are equal to c 2j for 1 < j <_J 2. This continues until the last elements of Ti are equal to 


31 



TIA/EIA/IS-69.5 


C6 >i for 1 < j < Finally, the reconstructed ^°§2 spectral amplitudes for the current frame are computed 
using the following equations. 


k = L ~ -l 
m 

(75) 

6[ = kt- [k t \ 

(76) 

log 2 M,(0) = f, + p{l-Si) log 2 M L ^(-l) 


+ p6i log 2 M^ ;J+1 (-l) 



L ( 0) 

- 1°62 %,(-!>+ (77) 

A=1 

In order to reconstruct M[( 0) using equations (75) through (77), the following assumptions are always 
made: 

M 0 (-l) = 1.0 (78) 

M/(—1) = 1) for l > L{—1) (79) 

In addition it is assumed that upon initialization Mi(. 1) = 1 for all /, and L(-1 ) = 30. Note that later 
sections of the IMBE decoder require the spectral amplitudes, Mi f or 1 < / < L, which must be computed 
by applying the inverse ^°§2 to each of the values computed with Equation (77). 

One final note is that it should be clear that the IMBE speech coder uses adaptive bit allocation 
and quantization which is dependent upon the number of harmonics in each frame. At the encoder the value 
L is used to determine the bit allocation and quantizer step sizes, while at the decoder the value L is used 
to determine the bit allocation and quantizer step sizes. In order to ensure proper operation it is necessary 

that these two values be equal (i.e. L = L). The encoder and decoder are designed to ensure this property 
except in the presence of a very large number of bit errors. In addition, the decoder is designed to detect 
frames where a large number of bit errors may prevent the generation of the correct bit allocation and 
quantizer step sizes. In this case the decoder discards the bits for the current frame and repeats the 
parameters from the previous frame. This is discussed in more detail in latter sections of this document. 
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Fig 20: Encoder Bit Manipulations 

7 Bit Manipulations 

The IMBE coder uses a number of different manipulations in order to increase its robustness to 
channel degradations. The quantizer values, b 0 , . ..,£> /+| , are first prioritized into a set of bit vectors 
denoted by w 0 , ...,u 6 . These vectors are optionally encrypted, and then they are protected with error 
control codes to produce a set of code vectors denoted by V 0 , ..., V 6 . These code vectors are then 
modulated to produce a set of modulated code vectors denoted by c 0 , ...,C 6 . Finally, intra-frame 

interleaving is used on the modulated code vectors in order to spread the effect of short burst errors. A 
block diagram of the bit manipulations performed by the encoder is shown in Figure 20. 

The IMBE decoder reverses the bit manipulations performed by the encoder. First the decoder de¬ 
interleaves each frame to obtain the seven modulated code vectors, c 0 , ...,C 6 . The decoder then 

demodulates these vectors to produce the code vectors V 0 , ..., V 6 and then error control decodes these code 
vectors to produce the bit vectors U Q , ..., M 6 . Next the decoder must decrypt the bit vectors (if encryption 
is employed at the encoder), and then it must rearrange the bit vectors to reconstruct the quantizer values, 
denoted by b fj , ..., b^ . These values are further decode use the techniques described in Section 6 and 
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finally used to synthesize the current frame of speech. A block diagram of the bit manipulations performed 
by the decoder is shown in Figure 21. 



Fig 21: Decoder Bit Manipulations 

7.1 Bit Prioritization 

The first bit manipulation performed by the IMBE encode is a rearrangement of the quantizer values b {) , 
into a set of seven prioritized bit vectors U 0 , ..., U () . The bit vector U 0 is 7 bits long. The bit 

vectors M, through W , are 12 bits long, U A and U 5 are 11 bits long, and U () is 23 bits long. Throughout 

this section the convention has been adopted that the bit N-l, where N is the vector length, is the MSB and 
bit 0 is the LSB. 

Prioritization of quantizer values into the set of bit vectors begins with U 0 . The most significant 
bit of U 0 is set to 0. The six remaining bits of U 0 (i.e. bits 6 through 1) are set equal to the six most 
significant bits (bits 7 through 2) of b Q . The three most significant bits of U x (bits 11 through 9) are set 
equal to the three most significant bits of b 2 . The quantizer values b 3 through bj +] are scanned as 
described below and the 32 highest priority bits are copied to the remaining 9 bits of U ] , the 12 bits of ll 2 , 
and the 11 most significant bits (bits 11 through 1) of M,. Bit 0 of Ll . is set equal to bit 2 of b 2 and the 
most significant bit of U 4 (bit 10) is set equal to bit 1 of b 2 - Next, all the bits of b } are inserted into the 
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prioritized bit vectors beginning with bit 9 of zi 4 . Scanning of the quantizer values b 3 , b /t then 
continues and the remaining bits of b 3 , b f f are inserted in the prioritized bit vectors. Scanning is 
complete when bit 2 of U 6 is reached. Bit 0 of b 2 is copied to bit 2 of ll () and bits 1 and 0 of b 0 are 
copied to bits 1 and 0 of U 6 , respectively. Specifically, these quantizer values are arranged as shown in 
Figure 22. In this figure the shaded areas represent the number of bits which were allocated to each of these 

values assuming L = 16. Note that for other values of L this figure would change in accordance with the 
bit allocation information contained in Appendices Annex F and Annex G. The remaining three bits of u 0 

are then selected by beginning in the upper left hand corner of this figure (i.e. bit 10 of ^3) and scanning left 
to right. When the end of any row is reached the scanning proceeds from left to right on the next lower row. 
Bit 8 of U l is set equal to the bit corresponding to the first shaded block which is encountered using the 
prescribed scanning order. Similarly, bit 7 of U x is set equal to the bit corresponding to the second shaded 

block which is encountered and bit 6 of u 0 is set equal to the bit corresponding to the third shaded block 
which is encountered. 

The scanning of the spectral amplitude quantizer values ^3 through ^L+i which is used to 
generate the last nine bits of u l is continued for the bit vectors u l through u 3. Each successive bit in these 
vectors is set equal to the bit corresponding to the next shaded block. This process begins with bit 8 of Ul , 
proceeds through bit 0 of Ul followed by bit 11 of and continues in this manner until finally reaching 

bit 1 of U3. At this point the 43 highest priority bits have been assigned to the bit vectors u 0 through u 3 
as shown in Figure 23. 

The next bits to be inserted into the bit vectors are all of the bits of (starting with the MSB), 

followed by bit 2 and then bit 1 of ^2, and then continuing with the scanning of ^3 through ^L+l as 
described above. These bits are inserted into the bit vectors beginning with bit 10 of U T proceeding through 
bit 0 of u 4 followed by bit 10 of u 5, and continuing in this manner until finally reaching bit 4 of U (> . The 

final three bits of U 6 , beginning with bit 2 and ending with bit 0, are set equal to bit 0 of ^2, bit 1 of ^0, 

and bit 0 of &0, respectively. A block diagram of this procedure is shown in Figure 24 for K = 6. 
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L = 16 


MSB 10 

9 

8 .. 

7.. . 

6.. 

5.. 

4 

3.. . 

2 .. . 

1 

0 

b 3 b 4 b 5 b 6 b 7 b 8 b 9 b 10 bn bi 2 bi 3 b 34 b 15 bjg b 17 = b£+i 

Figure 22: Priority Scanning of b 3 through b L ( 

7.2 Encryption 

This document treats optional encryption and decryption as transparent elements and does not attempt to 
define the actual encryption process. However, Figure 20 and 21 depict an encryption and decryption 
element, respectively, in order to illustrate the proper placement of these elements in the IMBE vocoder. 
During encryption the bit vectors U 0 , ..., U 6 are combined bit-by-bit with an encryption sequence. This 

same sequence must be used at the decoder to recover the bit vectors ll Q , ..., U (> . In order to be 
interoperable the encryption and decryption process must each use the same bit ordering. The standard 
ordering begins with the most significant bit (MSB) of U 0 and continues in order of significance until the 

least significant bit (LSB) of U 0 is reached. This is followed in order by the bit vectors U 0 , ..., W 6 
respectively, where each bit vector proceeds from MSB to LSB. 



Ul.8 

U17 

u 1 - 6 


Ul-5 

u 1 - 4 


U13 

U 1 - 2 

UU 

u 1 - 0 

UP-n 

UP- 10 

UP- 9 

UP- 8 

UP-7 

UP- 6 

u 25 

u 2 - 4 

UP 3 

UP 2 

UP- 1 

UP- 0 

u 3 - 9 

U3.8 

UP- 7 

UP- 6 

UP 6 

u 3 - 4 

UP- 3 

UP- 2 


8 

UP 7 

UP- 6 

u 55 

u 4 

UP 3 

UP 2 

tf-17 

U 6 -! 6 

u 6 - 15 

u 6 - 14 

UP-13 

u 6 - 12 

UP- n 

u 6 - 10 


U3.ll 

U 3 ' 1 

ve 1 


U3,10| 

tf-0 

u 6 - 8 







u 4 - 2 

U 4 -! 

u 4 - 0 

lP-10 


UP- 22 

UP- 21 

U 6 - 29 

UP- 19 

UP- 18 

UP-7 

UP- 6 

u 6 - 5 

u 6 - 4 

u 6 - 3 
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MSB LS 



(after priority scanning) 



(after priority scanning) 



(after priority scanning) 


Figure 23: Formation of the Code Vectors V 0 throught V 3 

7.3 Error Control Coding 

Forward error correction (FEC) codes are used to transform the prioritized bit vectors M 0 through U 6 into 
the vectors V 0 through V 6 as shown in Figure 20. The 54 FEC bits are added to the 88 speech bits in 
U 0 through U 6 to produce V 0 through V 6 . The 54 FEC bits are divided among one [19,7] extended Golay 

code, on [24,12] extended Golay code, two [23,12] Golay code, and two [15,11] Hamming codes. 
Generation of the v vectors is performed according to the following equations. 


= «o ' Sec (80) 

Vi=«i -Sec (81) 

v 2 =u 2 - g c (82) 

v 3 =u 3 -g G (83) 

v A =u A -g H (84) 
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v 5 =u 5 - g H (85) 

v 6 = u 6 (86) 

where g EC is the generator matrix for the [24,12] extended Golay code, g c is the generator matrix for 
the [23,12] Golay code, and g H is the generator matrix for the [15,11] Hamming code. Elements of the 
matrices g EG , g G , and g H are given below. Absent entries in the tables are equal to zero. Note that the 
matrix g G is simply the first 23 columns of the matrix g EG . 


K = 6 

MS LS 



(after priority scanning) 



(after priority scanning) 



(after priority scanning) 

Figure 24: Formation of the Code Vectors V 4 throught V 6 
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1 0 
0 1 0 
0 1 0 
0 1 0 
0 1 0 
0 1 0 
0 1 0 
0 1 0 
0 1 0 
0 1 0 
0 1 0 
0 1 


110001110101 

011000111011 

111101101000 

011110110100 

001111011010 

110110011001 

011011001101 

001101100111 

110111000110 

101010010111 

100100111110 

100011101011 


1 0 
0 1 0 
0 1 0 
0 1 0 
0 1 0 
0 1 0 
0 1 0 
0 1 0 
0 1 0 
0 1 0 
0 1 0 
0 1 


1 1 0 0 0 1 1 1 0 1 0 
0 1 1 0 0 0 1 1 1 0 1 
11110 110 10 0 
0 11110 110 10 
0 0 11110 110 1 
110 110 0 110 0 
0 110 110 0 110 
0 0 1 1 0 1 1 0 0 1 1 
1 1 0 1 1 1 0 0 0 1 1 
10 10 10 0 10 11 
1 0 0 1 0 0 1 1 1 1 1 
1 0 0 0 1 1 1 0 1 0 1 


39 



TIA/EIA/IS-69.5 


1 0 
0 1 0 
0 1 0 
0 1 
0 


& H = 


0 

1 0 
0 1 
0 


0 

1 0 
0 1 
0 


1 0 
1 1 
1 1 
1 1 
0 1 
1 0 
0 1 

0 1 0 
10 11 
0 10 0 1 
0 10 0 


0 1 
0 1 
1 1 
1 0 
1 1 
1 0 
0 1 
1 1 
0 0 
1 0 
1 1 


7.4 Bit Modulation 


The IMBE speech code uses bit modulation keyed off the code vector V 0 to provide a mechanism for 
detecting errors in V 0 beyond the errors that the [19,7] extended Golay code can correct. Not that the term 

bit modulation in the context of this document refers to the presented method for multiplying (or 
modulating) each frame of code vectors by a data dependent pseudo-random sequence. The first step in this 
procedure is to generate a set of binary modulation vectors which are added (modulo 2) to the code vectors 
V 0 through V 6 . The modulation vectors are generated from a pseudo-random sequence whose 


Pr(0) = 16u 0 (87) 

Pr(n) = 173 Pr (n - 1) + 13849 - 65536 [- 173pr ^~ ^ + 1384 -j ( 88 ) 

65536 

seed is derived from U 0 . Specifically, the sequence define in the following equations is used, where the bit 

vector U Q is interpreted as an unsigned 7 bit number in the range (0, 127). Equation (88) is used to 

recursively compute the pseudi-random sequence, Pr ( n ), over the range 1 <n< 100. Each element of this 
sequence can be interpreted as a 16 bit random number which is uniformly distributed over the interval (0, 
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65535). Using this interpretation, a set of binary modulation vectors, denoted by m 0 through /fl 6 ,are 


generated from this sequence as shown below. 


roo 

rh\ 


m 2 


m3 


rri4 

m 5 


[0, 0, 0] 

Pr(l) ■ . Pr( 2) . , M24),' 

.32768 J ’ ^32768 J ’’ L 32768 
>r (25) , , p r (26) , i Pr ( 47 ) . 

1 32768 J; 1 32768 J: "‘ 7 L 32768 . 
[ | Pr (48) Pt (49) p r (70) ' 

[ L 32768 J ' L 32768 J; 1 32768 J . 
' | Pt (71) | | Pr (72) Pr (85) ' 

_ 32768 J7 L 32768 J7 ''' 7 L 32768. 
| Pr (86) . Pr( 87 ) , | Pr( 100 )' 

. L 32768 J ’ L 32768 J 7 " ' 7 L 32768 J 


(89) 

(90) 

(91) 

(92) 

(93) 

(94) 


m 6 - [0. 0, ..., 0] 


(95) 


Once these modulation vectors have been computed in this manner, the modulated code vectors, for 
0 < i < 6, are computed by adding (modulo 2) the code vectors to the modulation vectors. 

Cj — v t + m, for 0 < i < 6 (96) 

One should note that the bit modulation performed by the IMBE encoder can be inverted by the decoder if 
c 0 does not contain any uncorrectable bit errors. In this case Golay decoding c 0, which always equals u 0 
since m 0 = 0, will yield the correct value of u 0. The decoder can then use u 0 to reconstruct the pseudo¬ 
random sequence and the modulation vectors m 1 through m b . Subtracting these vectors from Cj though 

C 6 will then yield the code vectors u l though At this point the remaining error control decoding can be 

performed. In the other case, where c 0 contains uncorrectable bit errors, the modulation cannot generally be 
inverted by the decoder. In this case the likely result of Golay decoding c 0 will be some u 0 which does not 
equal u 0. Consequently the decoder will initialize the pseudo-random sequence incorrectly, and the 
modulation vectors computed by the decoder will be uncorrelated with the modulation vectors used by the 
encoder. Using these incorrect modulation vectors to reconstruct the code vectors is essentially the same as 
passing ^l, . . ., C5 through a 50 percent bit error rate (BER) channel. The IMBE decoder exploits the fact 
that, statistically, a 50 percent BER causes the Golay and Hamming codes employed on v l through ^6 to 
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correct a number of errors which is near the maximum capability of the code. By counting the total number 
of errors which are corrected in all of these code vectors, the decoder is able to reliably detect frames in 
which c 0 is likely to contain uncorrectable bit errors. The decoder performs frame repeats during these 
frames in order to reduce the perceived degradation in the presence of bit errors. This is explained more 
fully in Sections 7.6 and 7.7. 

7.5 Bit Interleaving 

Intra-frame bit interleaving is used to spread short bursts of errors among several code words. The 
interleaving table for the 142 bits in each frame is tabulated in Annex H. This annex uses the notation 
scheme where bit N-l (where N is the vector length) is the MSB and bit 0 is the LSB. The minimum 
separation between any two bits of the same error correction code is, in most cases, 6 bits. 

7.6 Error Estimation 

The IMBE speech decoder estimates the number of errors in each received data frame by computing the 
number of errors corrected by the [19,7] Extended Golay code, the [24,12] Extended Golay code, the 
[23,12] Golay codes, and the [15,11] Hamming codes. The number of errors for each code vector is 
denoted G for 0 < i < 6, where G refers to the number of bit errors which were corrected during the error 
decoding of u i. From these error values two other error parameters are computed as shown below. 



e R (0) = 0.95 * e^(-l) + 0.00042e r (98) 


The parameter e R( 0) is the estimate of the error rate for the current frame, while e /?(-1) is the estimate of 
the error rate for the previous frame. These error parameters are used to control the frame repeat process 
described below, and to control the parametric smoothing functions described in Section 9. 


7.7 Frame Repeats 

The IMBE decoder examines each received data frame in order to detect and discard frames which are 
highly corrupted. A number of different fault conditions are checked and if any of these conditions indicate 

the current frame is invalid, then a frame repeat is performed. The IMBE speech encoder uses values of ^0 
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in the range 0 < ^0 < 207 to represent valid pitch estimates. The remaining values of ^0 are reserved for 
future expansion and are currently considered invalid. A frame repeat is performed by the decoder if it 

receives an invalid value of ^0, or if both of the following two equations are true. 


eo > 2 (99) 

e T > 12 (100) 

These two equations are used to detect the incorrect bit demodulation which results if there are 
uncorrectable bit errors in c 0. The decoder performs a frame repeat by taking the following steps: 

1) The current 142 bit received data frame is marked as invalid and subsequently ignored during future 
processing steps. 

2) The IMBE model parameters for the current frame are set equal to the IMBE model parameters for 
the previous frame. Specifically, the following update expressions are computed. 


<y 0 (0)=<y 0 (-l) 


(101) 

L 0 (0) = L 0 (-1) 


(102) 

K 0 (0) = K 0 (-l) 


(103) 

v*(0) = v t (-l) 

for 1 < k < K 

(104) 

M,(0) = M l (-l) 

for 1 < I < L 

(105) 

M / (0) = M l (-\) 

for 1 < I < L 

(106) 


3) The repeated model parameters are used in all future processing wherever the current model 
parameters are required. This includes the synthesis of the current segment of speech as is described 
in Section 11. 

7.8 Frame Muting 
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The IMBE decoder is required to mute in severe bit error environments for which e R > .0875. This 
capability causes the decoder to squelch its output if reliable communication cannot be supported. 

The recommended muting method is to first compute the update equations as listed in step (2) of 
the frame repeat process (see Section 7.7). The decoder should then bypass the speech synthesis algorithm 

described in Section 11 and, instead, set the synthetic speech signal, n ) to random noise which is 
uniformly distributed over the interval (-5, 5). This technique provides for a small amount of “comfort 
noise” as is typically done in telecommunication systems. 


8 Spectral Amplitude Enhancement 

The IMBE speech decoder attempts to improve the perceived quality of the synthesized speech by 
enhancing the spectral amplitudes. The unenhanced spectral amplitudes are required by future frames in the 
computation of Equation (77). However, the enhanced spectral amplitudes are used in speech synthesis. The 
spectral amplitude enhancement is accomplished by generating a set of spectral weights from the model 

parameters of the current frame. First R-MO and R m i are calculated as shown below. 

Rm = Y,Mf ( 10? ) 

(=i 

L 

Rmi = £ M{ c °s(cZ’ 0 l) (108) 

i=i 


Next, the parameters R-MO and Rmi are used to calculate a set of weights, H), given by 

•9&rfe 0 +/& ~2 R M0 R m cos (co 

®O^Mo(^M() — ^Ml) 

These weights are then used to enhance the spectral amplitudes for the current frame according to the 
relationship: 



:,/))' 


for 1 < / < L (109) 
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Mi 


if 81 < L 

1.2 

■Mi 

else if Wi > 1 .2 

.5 • 

Mi 

else if Wi < .5 

w ( 

■Mi 

otherwise 


for 1 < l < L 


( 110 ) 


A final step is to scale the enhanced spectral amplitudes in order to remove any energy difference between 
the enhanced and unenhanced amplitudes. The correct scale factor, denoted by "7, is given below. 



1 

2 


( 111 ) 


This scale factor is applied to each of the enhanced spectral amplitudes as shown in Equation (112). 


Ml = 7 • Mi 


for 1 < l < L 


( 112 ) 


For notational simplicity this equation refers to both the scaled and unsealed spectral amplitudes as Mi_ 
This convention has been adopted since the unsealed amplitudes are discarded and only the scaled 
amplitudes are subsequently used by the decoder during parameter smoothing and speech synthesis. 

The value of Rmo expressed in Equation (107) is a measure of the energy in the current frame. 
This value is used to update a local energy parameter in accordance with the following rule. 


S £ (0) = 


■95£>£•(—1) + .05 R M o if .95 5 b (-1) + .05 R M0 > 10000.0 


10000.0 


otherwise 


(113) 


This equation generates the local energy parameter for the current frame, &e( 0), from 0 and the value 

of the local energy parameter from the previous frame ^e( 1) n le parameter ^e{0) j s usec j j n the 

following section. 


9 Adaptive Smoothing 

As part of the error control process described in Section 7.6, the decoder estimates two error rate 
parameters, e T and e R, which measure the total number of errors and the local error rate for the current 
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Frame, respectively. These parameters are used by the decoder to adaptively smooth the decoded model 
parameters. The result is improved performance in high bit error environments. 

The first parameters to be smoothed by the decoder are the V/UV decisions. First an adaptive 
threshold I'M is calculated using equation (114), 


oo 


I'M ~ 


45.255 (Se( 0))' 375 
exp(277.26es(0)) 

1.414 (5 S (0)) 375 


if €ij(0) < .005 and tx < 4 
else if efl{0) < .0125 and £ 4=0 
otherwise 


(114) 


where the energy parameter is defined in Equation (113) in Section 8. After the adaptive threshold 

is computed each enhanced spectral amplitude Mi f or 1 < / < L i s compared against Vm, and if Mi > 

V M then the V/UV decision for that spectral amplitude is left unchanged. This process can be expressed 
mathematically as shown below. 


VI ~ 


1 if Mi > V M 

V[ otherwise 


for 1 <1 <L 


(115) 


Once the V/UV decisions have been smoothed, the decoder adaptively smoothes the spectral 

amplitudes Mi f or ] < / < L xhe spectral amplitude smoothing algorithm computes the following 
amplitude measure for the current segment. 

L 

A m = £ Mi (116) 

(=i 


Next an amplitude threshold is updated according to the following equation, 

I 20480 if €r( 0) < .005 and er(0) < 6 

t m ( 0 ) = l ( 117 ) 

6000 — 300«T + tm(— 1) otherwise 

where t m( 0) anc j 1) represent the value of the amplitude threshold for the current and previous 

frames respectively. The two parameters and T M (0) a |- e then used to compute a scale factor O'M given 
below. 


7 M ~ 


1.0 if Tjv,f (0) > Am 
otherwise 


(118) 
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M,_ 

1<1<L 


to speech 
synthesis 


V]_ 

1<1<L 


e T Er 


Fig. 25: Parameter Enhancement and Smoothing 

This scale factor is multiplied by each of the spectral amplitudes Mi f or 1 < / < L. Note that this 
step must be completed after spectral amplitude enhancement has been performed using the methods of 
Section 8 and after has been computed according to Equation 112. The correct sequence is shown in 
Figure 25. 


i 


■ • ■ c Ui 

1 

2 

Ti ,T 2 

2 

2 

r 3 ,f 4 

3 

3 

t 5 ,t 6 ,t 7 

4 

3 

T8,fg,Tio 

5 

3 

Tii,T 12 ,T 13 

6 

3 

7l4,7i5.Ti6 


Table 5: Division of Prediction Residuals into Blocks in Encoding Example 


10 Parameter Encoding Example 


This section provides an example of the quantization and bit manipulation for a typical parameter 

- _ 2 - 

frame. In this example the fundamental frequency is assumed to be equal to a, ° — 35 . 125 . Since the values 

of L and K are related to ^0 through equations (37) and (38), they are equal to L = 16 and K = 6. The 
remaining model parameters are left unspecified since they do not affect the numbers presented in this 
example. 
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The encoding of this example parameter frame proceeds as follows. First the fundamental 
frequency is encoded into the 8 bit value ^0 using equation (45), and the 6 voiced/unvoiced decisions are 

encoded into the 6 bit value using equation (49). The 16 spectral amplitude prediction residuals, J'l for 
1 < Z < 16, are then formed using equations (52) through (55). Next, these prediction residuals are divided 

into six blocks where the lengths of each block, Ji for 1 < i < 6, are shown in Table 5. The spectral 
amplitude prediction residuals are then divided into the six vectors Cl >3 for 1 < i < 6 and 1 <7 <Ji. The first 
J 1 elements of 'J'l form Cl J. The next J 2 elements of J'l for C2 J, and so on. This is shown in Table 5. 
Each block for 1 < i < 6, is transformed with a Ji point DCT using equation (60) to produce the set 
DCT coefficients for 1 <k <Ji. The first DCT coefficient from each of the six blocks is used to form 
the gain vector Ri . The gain vector is then transformed into the vector Gm using the six point DCT shown 
in Equation (61). The first element of the transformed gain vector, denoted by u 'l, is then quantized using 
the non uniform quantizer tabulated in Annex E. The 6 bit value b 2 is set equal to the index of the 
quantizer element which 


m 

Gm 

B m 

^m 

2 

g 2 

6 

.04650 

3 

Gs 

6 

.03015 

4 

g 4 

6 

.02520 

5 

G.5 

5 

.04060 

6 


5 

.03696 


Table 6: Example Bit Allocation and Step Size for the Transformed Gain Vector 

is closest to G\ Tf le remaining five elements of the transformed gain vector are quantized using the 

uniform quantizers generated from Annex F with L = 16. The correct step sizes and bit allocation for 
through G§ is shown in Table 6. Once the bit allocation and step sizes have been computed, the bit 

encodings ^3 through are generated using Equation (62). 

After the gain vector has been quantized and encoded, the remaining bits are distributed among the 

ten higher order DCT coefficients, ^i,k for 1 < / < 6 and 2 < k < Ji. This is done using Annex G and the 
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resulting bit allocation is shown in Table 7. Each DCT coefficient is then quantized using equation (63). 
The step sizes for these quantizers are computed using Tables 3 and 4, and the results are shown in Table 7. 

Finally, the 18 bit encodings, b (] through b n are then rearranged into the seven bit vectors Ll f) 

through M 6 . This is accomplished using the procedure described in Section 7, and the result is shown in 

Tables 8 through 10. The convention in these tables is that the appropriate bit from the vector listed in the 
first two columns is set equal to the appropriate bit from the bit encoding listed in the last two columns, 
where the least significant bit corresponds to bit 1. The bit vector U Q is encoded with a [19,7] Extended 

Golay code into the code vector V 0 . Bit vector U , is encoded with a [24,12] Extended Golay code into the 
code vector V l . Bit vectors U 2 and U 3 are encoded with a [23,12] Golay code into the code vectors V 0 
and V 3 , respectively. Similarly, the two bit vectors U A and U 5 are each encoded with a [15,11] Hamming 
code into the code vectors V 4 and V 5 . The vector V 6 is set equal to U 6 . These code vectors are then 
modulated using Equation (96) to produce the modulated code vectors C 0 through C 6 . The seven 

modulated code vectors are then interleaved as specified in Appendix H, and finally the frame bits are 
embedded in the Enhance Digital Access Communications System format in ascending order. 


m 

Ci, k 

Bm 


8 

Cl,2 

6 

.04605 

9 

&2,2 

6 

.04605 

10 

Cz, 2 

5 

.08596 

11 

^3,3 

4 

.09640 

12 

£*4,2 

4 

.12280 

13 

C4,3 

3 

.15665 

14 

C~>,2 

3 

.19955 

15 

Cz,z 

3 

.15665 

16 

C6,2 

3 

.19955 

17 

^6,3 

2 

.20485 


Table 7: Example Bit Allocation and Step Size for Higher Order DCT Coefficients 
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11 Speech Synthesis 

As was discussed in Section 5, the IMBE speech coder estimates a set of model parameters for each 
speech frame. These parameters consist of the fundamental frequency ^0, the V/UV decisions for each 

frequency band v k, and the spectral amplitudes M^ After the transmitted bits are received and decoded, a 
reconstructed set of model parameters is available for synthesizing speech. These reconstructed model 

parameters (after parameter enhancement and smoothing) are denoted ^’0- Vl and Ml, and they 
correspond to the reconstructed fundamental frequency, V/UV decisions and spectral amplitudes, 
respectively. In addition the parameter L, defined as the number of spectral amplitudes in the current frame, 
is generated from ^'0 according to Equation (47). Because of a number of factors (such as quantization and 
channel errors) the reconstructed model parameters are not the same as the estimated model parameters w 0, 
Vk and Mi. 


11.1 Speech Synthesis Notation 


The IMBE speech synthesis algorithm uses the reconstructed model parameters to generate a speech 
signal which is perceptually similar to the original speech signal. For each new set of model parameters, the 

synthesis algorithm generates a 20 ms frame of speech, n ), which is interpolated between the previous 
set of model parameters and the newest or current set of model parameters. The notation 

.L(O), d>o(0), Vi (0), an( j Mi (0) j s usec | t() c | cnote q le current set of reconstructed model parameters, 

while the notation 1 )j — 1)> Vl(~ 1) anc j Mi( — 1 ) j s use( j to d eno te the previous set of 


reconstructed model parameters. For each new set of model parameters, s v n j is generated in the range 0 < 
n < N , where N equals 160 samples (20 ms.). This synthetic speech signal is the output of the IMBE voice 
coder and is suitable for digital to analog conversion with a sixteen bit converter, 
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The synthetic speech signal is divided into a voiced component 5 tn n ) and an unvoiced component 
Suv{ n ). These two components are synthesized separately, as shown in Figure 26, and then summed to 

form s(n) The unvoiced speech synthesis algorithm is discussed in Section 11.2 and the voiced speech 
synthesis algorithm is discussed in Section 11.3. 


MBE Model 
Parameters 




l<k<K 

M, _ 

1<1<L 


Unvoiced 
Speech Synthesis 


Voiced 

Speech Synthesis 


s uv (n) 


t 


s y (n) 


- s(n) 

Synthetic 

Speech 


Fig. 26: IMBE Speech Synthesis 


11.2 Unvoiced Speech Synthesis 


The energy from unvoiced spectral amplitudes is synthesized with an unvoiced speech synthesis 

algorithm. First a white noise sequence, u ( n ), is generated. This noise sequence can have an arbitrary 
mean. A recommended noise sequence (10) can be generated as shown below. 


u{n + 1) = 171u(n) + 11213 - 53125[ 


171«(n) 


53125 


11213 . 

“J 


(119) 


The noise sequence is initialized to m(-105) = 3147. 

For each successive synthesis frame, u{n) is shifted by 20 ms. (160 samples) and windowed with 

w s ( n ), which is given in Annex I. Since w s( n ) has a non-zero length of 209 samples, there is a 49 
sample overlap between the noise signal used in successive synthesis frames. Once the noise sequence has 

been shifted and windowed, the 256 point Discrete Fourier Transform is computed according to: 

104 . 2mnn 

UJm)= ^ u(n)w s (n)e 256 for-128<m<127 (120) 

n =—104 


The function U w (m) j s generated in a manner which is analogous to ^( m ) defined in Equation 
except that u(n) and w s( n ) are usec j j n place of s(n) and w R( n ). 


(28) 
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The function 


U w (m) 


is then modified to create 


U w {m) / j n ran g e 1 < / < -^(0), 


U w {m) . 


is computed according to Equation (122) if the /’th spectral amplitude is unvoiced. 


U w { m) = 


U m (m) = 




&w f 

(b, -[a, 


for [a/] < |m| < \b[] (121) 

for ["a”|< |m| < \ b | (122) 


The unvoiced scaling coefficient is a function of the synthesis window w s{ n ) and the pitch refinement 
window w R{ n ). It is computed according to the formula: 


i tu 

y w = L w * (n ) 


L 104 2 < \ 

L 110 2, x 

no 


(123) 


The frequency bands edges 4 and 4 are computed from ^’o according to equations (124) and (125), 


respectively. 


- 256 n - 

ai — ~z—\l — -o) ■ wo 

27T 

r 256 /i - 

b l - ~{l -i- .5) - Wo 

2tt 


Finally, the very low frequency and very high frequency components of 


U w (m ) 


are set equal to 


zero as shown in the following equation. 


[4,(772) = 


for jmj < [ai] 
for [6^] < |mj < 128 


The sequence u w\ n )^ defined as the 256 point Inverse Discrete Fourier Transform of 


U w (m) 


the unvoiced speech for the current frame. The sequence 


u w (n) ■ 


equation. 


1 127 ~ j 

4 »=— Y.u- (m)e 

250 m -| 28 


is computed as shown in the following 


for -128 <n <127 (127) 
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In order to generate u w(n) must combined with the unvoiced speech from the previous frame. 

U (Tt 0) 

This is accomplished using the Weighted Overlap Add algorithm described in (4). If w v ’ ' is used to 
denote the unvoiced speech for the previous frame, then ^ uv ( n ) is given by 


- I \ _ w s{n)u w {n, -l) ^ws{n ~ N)u w {n-N,Q) 
Suv[n) - w%{n) + ^ s {n-N) 


for 0 < n < N (128) 


In this equation w s{ n ) is assumed to be zero outside the range -105 < n < 105, and ( n - and 
u w (n, —1) are assumec | t 0 be zero outside the range -128 < n < 127. 

11.3 Voiced Speech Synthesis 

The voiced speech is synthesized by summing a voiced signal for each spectral amplitude 
according to the following equation. 

max [Z<—1U(0)] 

s v {n)= £2- s vl {n) for 0 < n < N (129) 

/=! 


The reader is referred to references (1,3) for background information on the algorithm described in this 
section. The voiced synthesis algorithm attempts to match the /’th spectral amplitude of the current frame 
with the /’th spectral amplitude of the previous frame. The algorithm assumes that all spectral amplitudes 
outside the allowed range are equal to zero as shown in Equations (130) and (131). 


Mi (0) = 0 

for / > L{ 0) 

(130) 

1 

II 

o 

for l > L(—1) 

(131) 


In addition it assumes that these spectral amplitudes are unvoiced. These assumptions are needed for the 
case where the number of spectral amplitudes in the current frame is not equal to the number of spectral 

amplitudes in the previous frame, (i.e. ^(0) 7^ — 1)^ 

The signal Sv,l( n ) is computed differently for each spectral amplitude. If the /’th spectral 
amplitude is unvoiced for both the previous and current speech frame then is set equal to zero as 

shown in the following equation. In this case the energy in this region of the spectrum is completely 
synthesized by the unvoiced synthesis algorithm described in the previous section. 

s v> i(n) = 0 for 0 < n < N (132) 
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Alternatively, if the /’ th spectral amplitude is unvoiced for the current frame and voiced for the 

previous frame, then ^v,l( n ) is given by the following equation. In this case the energy in this region of the 
specttum ttansitions from the voiced synthesis algorithm to the unvoiced synthesis algorithm. 

s v j{n) =w s {n)Mi(-l) cos[u} 0 {-l)nl+ <f>i(-l)] for 0 <n<N (133) 

Similarly, if the /’th specUal amplitude is voiced for the current frame and unvoiced for the 

previous frame then ■**>,/( n ) is given by the following equation. In this case the energy in this region of the 
spectrum 

Transitions from the unvoiced synthesis algorithm to the voiced synthesis algorithm 

s v .i{n) - w$(n - N) Mi(0) cos[u>o(0)(n - N) l + <fr(0)] for 0 < n < N (134) 


Otherwise, if the /’th spectral amplitude is voiced for both the current and the previous frame, and 

if either / >= 8 or l^ 0 ^) ^ ^o(O)^ t jj en s v j(n) j s gj ven by t be f 0 n ow j n g equation. In 

this case the energy in this region of the spectrum is completely synthesized by the voiced synthesis 
algorithm. 

S v> i(n) - ws{n) M/(-l) cos[uo(~l)nl + 0;(-l)] 

+ ws(n - N) M/(0) cos[u;o(0)(« — N) l + 0/(0)] ( 135 ) 

The variable n is resUicted to the range 0 <n <N. The synthesis window w s( n ) use d in Equations (133), 
(134) and (135) is assumed to be equal to zero outside the range -105 < n < 105. 

A final rule is used if the /’th spectral amplitude is voiced for both the current and the previous 

frame, and if both / < 8 and l ^’ 0 ^ ^)l ^ ^o(O) ^ this case j s gi ven by the 

following equation, and the energy in this region of the spectrum is completely synthesized by the voiced 
synthesis algorithm. 


V/(n) = ai(n) cos[0;(n)] 


The amplitude function 


ai(n) 


is given by. 


for 0 < n < N 


(136) 
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Mn) = Mi(- 1) + - Af,(-1)] 


and the phase function 


Ol{n) 


is given by Equations (138) through (140). 


(137) 


0i(n) ~ + [wo(-l) • l + Awi(0)]n 4- [o? 0 (0) - wo(-l)] 


In 2 
2 N 


A^(0) = M 0) - <&(-!) - [Co o(-l) + wo(0)] 


IN 


A«t(0) = - 


A^(0)-2 7 r[ A ^ ( 2 °] + 7r j 


(138) 

(139) 

(140) 


The phase parameter 4>l which is used in the above equations must be updated for each frame 
using Equations (141) through (143). The notation 4 > l( 0) and V4(0) refers to the parameter values in the 
current frame, while 0/(-l) and V’Z(-l) denotes their counterparts in the previous frame. 


1M 

ip,(0) - + [wo(-l) +u>o(0)] • y 


for 1 < / < 56 


(141) 


M0) = 


MO) 

M0) 


Luv (0 }’Pi ( 0 ) 

L( 0) 


for 1 < l < [|j 

for L^j < l < max[L( —1), L(0)] 


(142) 


L uu (0) 

The parameter is equal to the number of unvoiced spectral amplitudes in the current frame, and the 

parameter Pl(0) usec | j n e q Ua ti 0 n (142) is defined to be a random number which is uniformly distributed in 

-7T,7T) 


the interval 


’. This random number can be generated using the following equation, 


< y < rr 

Pli0) = 53125 ’ U(l) -* 


(143) 


where u(I) refers to shifted noise sequence for the current frame, which is described in Section 11.2. 

Note that ^(0) must be updated every frame using Equation (141) for 1 < / < 56, regardless of the 
value of L or the value of the V/UV decisions. 
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Once s v,l\ n ) j s generated for each spectral amplitude the complete voiced component is 
generated according to equation (129). The synthetic speech signal is then computed by summing the 
voiced and unvoiced components as shown in equation (144). 


s(n) = 5 ul .(n) + s v (n) for 0 < n < N. 


(144) 


This completes the IMBE speech synthesis algorithm. 


Algorithm 

Delay (ms.) 

Analysis 

73.75 

Quantization 

0.0 

FEC/Interleaving 

0.0 

Reconstruction 

0.0 

Synthesis 

6.25 


Table 11: Breakdown of Algorithmic Delay 


12 Additional Notes 

The total algorithmic delay is 80 ms. This does not include any processing delay or transmission delay. 
The break down of the delay is shown in Table 11. The analysis delay is due to the filtering, windowing and 
two frame look-ahead used in the initial pitch estimation algorithm. The synthesis delay is introduced by the 
manner in which the synthesis algorithm smoothly transitions between the parameters estimated for 
consecutive speech frames. 

In a few of the figures and the flow charts, the variable x is equivalent to the variable x . For 
example the variable v in Figure 15 refers to the variable v in the text. This notational discrepancy is a 
consequence of the graphical software used to produce this document. 
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Annex A: Variable Initialization 


Variable 

Initial Value 

P- 1 

100 

P-2 

100 

E-i{P) 

0 for all P 

E-2 (P) 

0 for all P 

£max 

100000 

& o(-l) 

.02985tt 

Mi(-l) 

1 for all l 

Mi(-l) 

0 for all l 

L(~ 1) 

30 

K(- 1) 

10 

«*(-!) 

0 for all k 

«*(-!) 

0 for all k 

CR 

0.0 

Se 

75000 

u(n ) 

u(—105) = 3147 

u w (n, -1) 

0 for all n 

0K-1) 

0 for all l 

M-i) 

0 for all l 
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Annex B: Initial Pitch Estimation Window 


n 

wj{n) 

n 

wj(n) 

n 

wi (n) 

n 

wj(n) 

-150 

0.00270174 

-110 

0.02113325 

-70 

0.05359430 

-30 

0.08371302 

-149 

0.00295485 

-109 

0.02181198 

-69 

0.05446897 

-29 

0.08424167 

-148 

0.00321783 

-108 

0.02250030 

-68 

0.05534209 

-28 

0.08475497 

-147 

0.00349080 

-107 

0.02319806 

-67 

0.05621329 

-27 

0.08525267 

-146 

0.00377385 

-106 

0.02390509 

-66 

0.05708219 

-26 

0.08573447 

-145 

0.00406710 

-105 

0.02462122 

-65 

0.05794842 

-25 

0.08620022 

-144 

0.00437064 

-104 

0.02534628 

-64 

0.05881159 

-24 

0.08664963 

-143 

0.00468457 

-103 

0.02608007 

-63 

0.05967136 

-23 

0.08708246 

-142 

0.00500898 

-102 

0.02682242 

-62 

0.06052732 

-22 

0.08749853 

-141 

0.00534396 

-101 

0.02757312 

-61 

0.06137912 

-21 

0.08789764 

-140 

0.00568957 

-100 

0.02833197 

-60 

0.06222639 

-20 

0.08827950 

-139 

0.00604590 

-99 

0.02909875 

-59 

0.06306869 

-19 

0.08864402 

-138 

0.00641300 

-98 

0.02987323 

-58 

0.06390573 

-18 

0.08899096 

-137 

0.00679095 

-97 

0.03065519 

-57 

0.06473708 

-17 

0.08932017 

-136 

0.00717979 

-96 

0.03144442 

-56 

0.06556234 

-16 

0.08963151 

-135 

0.00757957 

-95 

0.03224064 

-55 

0.06638119 

-15 

0.08992471 

-134 

0.00799034 

-94 

0.03304362 

-54 

0.06719328 

-14 

0.09019975 

-133 

0.00841213 

-93 

0.03385311 

-53 

0.06799815 

-13 

0.09045644 

-132 

0.00884496 

-92 

0.03466884 

-52 

0.06879549 

-12 

0.09069464 

-131 

0.00928887 

-91 

0.03549054 

-51 

0.06958490 

-11 

0.09091420 

-130 

0.00974387 

-90 

0.03631795 

-50 

0.07036604 

-10 

0.09111510 

-129 

0.01020996 

-89 

0.03715080 

-49 

0.07113854 

-9 

0.09129713 

-128 

0.01068715 

-88 

0.03798876 

-48 

0.07190202 

-8 

0.09146029 

-127 

0.01117544 

-87 

0.03883159 

-47 

0.07265611 

-7 

0.09160442 

-126 

0.01167480 

-86 

0.03967896 

-46 

0.07340052 

-6 

0.09172948 

-125 

0.01218523 

-85 

0.04053057 

-45 

0.07413481 

-5 

0.09183547 

-124 

0.01270669 

-84 

0.04138612 

-44 

0.07485873 

-4 

0.09192220 

-123 

0.01323915 

-83 

0.04224531 

-43 

0.07557184 

-3 

0.09198972 

-122 

0.01378257 

-82 

0.04310780 

-42 

0.07627381 

-2 

0.09203795 

-121 

0.01433691 

-81 

0.04397328 

-41 

0.07696434 

-1 

0.09206691 

-120 

0.01490209 

-80 

0.04484143 

-40 

0.07764313 

0 

0.09207659 

-119 

0.01547807 

-79 

0.04571191 

-39 

0.07830978 

1 

0.09206691 

-118 

0.01606477 

-78 

0.04658438 

-38 

0.07896401 

2 

0.09203795 

-117 

0.01666212 

-77 

0.04745849 

-37 

0.07960547 

3 

0.09198972 

-116 

0.01727001 

-76 

0.04833391 

-36 

0.08023385 

4 

0.09192220 

-115 

0.01788837 

-75 

0.04921030 

-35 

0.08084887 

5 

0.09183547 

-114 

0.01851709 

-74 

0.05008731 

-34 

0.08145022 

6 

0.09172948 

-113 

0.01915605 

-73 

0.05096456 

-33 

0.08203757 

7 

0.09160442 

-112 

0.01980515 

-72 

0.05184171 

-32 

0.08261070 

8 

0.09146029 

-111 

0.02046427 

-71 

0.05271840 

-31 

0.08316927 

9 

0.09129713 
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n 

W[(n) 

n 

wi(n) 

n 

wj(n) 

n 

wj{n) 

10 

0.09111510 

50 

0.07036604 

90 

0.03631795 

130 

0.00974387 

11 

0.09091420 

51 

0.06958490 

91 

0.03549054 

131 

0.00928887 

12 

0.09069464 

52 

0.06879549 

92 

0.03466884 

132 

0.00884496 

13 

0.09045644 

53 

0.06799815 

93 

0.03385311 

133 

0.00841213 

14 

0.09019975 

54 

0.06719328 

94 

0.03304362 

134 

0.00799034 

15 

0.08992471 

55 

0.06638119 

95 

0.03224064 

135 

0.00757957 

16 

0.08963151 

56 

0.06556234 

96 

0.03144442 

136 

0.00717979 

17 

0.08932017 

57 

0.06473708 

97 

0.03065519 

137 

0.00679095 

18 

0.08899096 

58 

0.06390573 

98 

0.02987323 

138 

0.00641300 

19 

0.08864402 

59 

0.06306869 

99 

0.02909875 

139 

0.00604590 

20 

0.08827950 

60 

0.06222639 

100 

0.02833197 

140 

0.00568957 

21 

0.08789764 

61 

0.06137912 

101 

0.02757312 

141 

0.00534396 

22 

0.08749853 

62 

0.06052732 

102 

0.02682242 

142 

0.00500898 

23 

0.08708246 

63 

0.05967136 

103 

0.02608007 

143 

0.00468457 

24 

0.08664963 

64 

0.05881159 

104 

0.02534628 

144 

0.00437064 

25 

0.08620022 

65 

0.05794842 

105 

0.02462122 

145 

0.00406710 

26 

0.08573447 

66 

0.05708219 

106 

0.02390509 

146 

0.00377385 

27 

0.08525267 

67 

0.05621329 

107 

0.02319806 

147 

0.00349080 

28 

0.08475497 

68 

0.05534209 

108 

0.02250030 

148 

0.00321783 

29 

0.08424167 

69 

0.05446897 

109 

0.02181198 

149 

0.00295485 

30 

0.08371302 

70 

0.05359430 

110 

0.02113325 

150 

0.00270174 

31 

0.08316927 

71 

0.05271840 

111 

0.02046427 



32 

0.08261070 

72 

0.05184171 

112 

0.01980515 



33 

0.08203757 

73 

0.05096456 

113 

0.01915605 



34 

0.08145022 

74 

0.05008731 

114 

0.01851709 



35 

0.08084887 

75 

0.04921030 

115 

0.01788837 



36 

0.08023385 

76 

0.04833391 

116 

0.01727001 



37 

0.07960547 

77 

0.04745849 

117 

0.01666212 



38 

0.07896401 

78 

0.04658438 

118 

0.01606477 



39 

0.07830978 

79 

0.04571191 

119 

0.01547807 



40 

0.07764313 

80 

0.04484143 

120 

0.01490209 



41 

0.07696434 

81 

0.04397328 

121 

0.01433691 



42 

0.07627381 

82 

0.04310780 

122 

0.01378257 



43 

0.07557184 

83 

0.04224531 

123 

0.01323915 



44 

0.07485873 

84 

0.04138612 

124 

0.01270669 



45 

0.07413481 

85 

0.04053057 

125 

0.01218523 



46 

0.07340052 

86 

0.03967896 

126 

0.01167480 



47 

0.07265611 

87 

0.03883159 

127 

0.01117544 



48 

0.07190202 

88 

0.03798876 

128 

0.01068715 



49 

0.07113854 

89 

0.03715080 

129 

0.01020996 
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Annex C: Pitch Refinement Window 


n 

w R (n) 

n 

w R (n) 

n 

w R {n) 

n 

w R (n) 

n 

w R (n) 

-110 

0.014873 

-78 

0.205355 

-46 

0.607067 

-14 

0.956477 

18 

0.928916 

-109 

0.017397 

-77 

0.215294 

-45 

0.620807 

-13 

0.962377 

19 

0.921074 

-108 

0.020102 

-76 

0.225466 

-44 

0.634490 

-12 

0.967866 

20 

0.912868 

-107 

0.022995 

-75 

0.235869 

-43 

0.648105 

-11 

0.972940 

21 

0.904307 

-106 

0.026081 

-74 

0.246497 

-42 

0.661638 

-10 

0.977592 

22 

0.895400 

-105 

0.029365 

-73 

0.257347 

-41 

0.675076 

-9 

0.981817 

23 

0.886157 

-104 

0.032852 

-72 

0.268413 

-40 

0.688406 

-8 

0.985610 

24 

0.876589 

-103 

0.036546 

-71 

0.279689 

-39 

0.701616 

-7 

0.988967 

25 

0.866705 

-102 

0.040451 

-70 

0.291171 

-38 

0.714692 

-6 

0.991884 

26 

0.856516 

-101 

0.044573 

-69 

0.302851 

-37 

0.727620 

-5 

0.994358 

27 

0.846033 

-100 

0.048915 

-68 

0.314724 

-36 

0.740390 

-4 

0.996386 

28 

0.835267 

-99 

0.053482 

-67 

0.326782 

-35 

0.752986 

-3 

0.997966 

29 

0.824231 

-98 

0.058277 

-66 

0.339018 

-34 

0.765397 

-2 

0.999095 

30 

0.812935 

-97 

0.063303 

-65 

0.351425 

-33 

0.777610 

-1 

0.999774 

31 

0.801391 

-96 

0.068563 

-64 

0.363994 

-32 

0.789612 

0 

1.000000 

32 

0.789612 

-95 

0.074062 

-63 

0.376718 

-31 

0.801391 

1 

0.999774 

33 

0.777610 

-94 

0.079801 

-62 

0.389588 

-30 

0.812935 

2 

0.999095 

34 

0.765397 

-93 

0.085782 

-61 

0.402594 

-29 

0.824231 

3 

0.997966 

35 

0.752986 

-92 

0.092009 

-60 

0.415727 

-28 

0.835267 

4 

0.996386 

36 

0.740390 

-91 

0.098483 

-59 

0.428978 

-27 

0.846033 

5 

0.994358 

37 

0.727620 

-90 

0.105205 

-58 

0.442337 

-26 

0.856516 

6 

0.991884 

38 

0.714692 

-89 

0.112176 

-57 

0.455793 

-25 

0.866705 

7 

0.988967 

39 

0.701616 

-88 

0.119398 

-56 

0.469336 

-24 

0.876589 

8 

0.985610 

40 

0.688406 

-87 

0.126872 

-55 

0.482955 

-23 

0.886157 

9 

0.981817 

41 

0.675076 

-86 

0.134596 

-54 

0.496640 

-22 

0.895400 

10 

0.977592 

42 

0.661638 

-85 

0.142572 

-53 

0.510379 

-21 

0.904307 

11 

0.972940 

43 

0.648105 

-84 

0.150799 

-52 

0.524160 

-20 

0.912868 

12 

0.967866 

44 

0.634490 

-83 

0.159276 

-51 

0.537971 

-19 

0.921074 

13 

0.962377 

45 

0.620807 

-82 

0.168001 

-50 

0.551802 

-18 

0.928916 

14 

0.956477 

46 

0.607067 

-81 

0.176974 

-49 

0.565639 

-17 

0.936386 

15 

0.950174 

47 

0.593284 

-80 

0.186192 

-48 

0.579470 

-16 

0.943474 

16 

0.943474 

48 

0.579470 

-79 

0.195653 

-47 

0.593284 

-15 

0.950174 

17 

0.936386 

49 

0.565639 
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n 

w R {n) 

n 

w R {n) 

50 

0.551802 

82 

0.168001 

51 

0.537971 

83 

0.159276 

52 

0.524160 

84 

0.150799 

53 

0.510379 

85 

0.142572 

54 

0.496640 

86 

0.134596 

55 

0.482955 

87 

0.126872 

56 

0.469336 

88 

0.119398 

57 

0.455793 

89 

0.112176 

58 

0.442337 

90 

0.105205 

59 

0.428978 

91 

0.098483 

60 

0.415727 

92 

0.092009 

61 

0.402594 

93 

0.085782 

62 

0.389588 

94 

0.079801 

63 

0.376718 

95 

0.074062 

64 

0.363994 

96 

0.068563 

65 

0.351425 

97 

0.063303 

66 

0.339018 

98 

0.058277 

67 

0.326782 

99 

0.053482 

68 

0.314724 

100 

0.048915 

69 

0.302851 

101 

0.044573 

70 

0.291171 

102 

0.040451 

71 

0.279689 

103 

0.036546 

72 

0.268413 

104 

0.032852 

73 

0.257347 

105 

0.029365 

74 

0.246497 

106 

0.026081 

75 

0.235869 

107 

0.022995 

76 

0.225466 

108 

0.020102 

77 

0.215294 

109 

0.017397 

78 

0.205355 

110 

0.014873 

79 

0.195653 



80 

0.186192 



81 

0.176974 
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Annex D: FIR Low Pass Filter 


n 

hLPF{n) 

-10 

-.002898 

-9 

-.002831 

-8 

.005666 

-7 

.016601 

-6 

.008800 

-5 

-.026955 

-4 

-.055990 

-3 

-.015116 

-2 

.118754 

1 

.278990 

0 

.351338 

1 

.278990 

2 

.118754 

3 

-.015116 

4 

-.055990 

5 

-.026955 

6 

.008800 

7 

.016601 

8 

.005666 

9 

-.002831 

10 

-.002898 
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Annex E: Gain Quantizer Levels 


i>2 

Quantizer Level 

i>2 

Quantizer Level 

0 

-2.842205 

32 

2.653909 

1 

-2.694235 

33 

2.780654 

2 

-2.558260 

34 

2.925355 

3 

-2.382850 

35 

3.076390 

4 

-2.221042 

36 

3.220825 

5 

-2.095574 

37 

3.402869 

6 

-1.980845 

38 

3.585096 

7 

-1.836058 

39 

3.784606 

8 

-1.645556 

40 

3.955521 

9 

-1.417658 

41 

4.155636 

10 

-1.261301 

42 

4.314009 

11 

-1.125631 

43 

4.444150 

12 

-0.958207 

44 

4.577542 

13 

-0.781591 

45 

4.735552 

14 

-0.555837 

46 

4.909493 

15 

-0.346976 

47 

5.085264 

16 

-0.147249 

48 

5.254767 

17 

0.027755 

49 

5.411894 

18 

0.211495 

50 

5.568094 

19 

0.388380 

51 

5.738523 

20 

0.552873 

52 

5.919215 

21 

0.737223 

53 

6.087701 

22 

0.932197 

54 

6.280685 

23 

1.139032 

55 

6.464201 

24 

1.320955 

56 

6.647736 

25 

1.483433 

57 

6.834672 

26 

1.648297 

58 

7.022583 

27 

1.801447 

59 

7.211777 

28 

1.942731 

60 

7.471016 

29 

2.118613 

61 

7.738948 

30 

2.321486 

62 

8.124863 

31 

2.504443 

63 

8.695827 
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Annex F: Bit Allocation and Step Size for Transformed Gain 
Vector 


L 

G m —i 

bm 



- X - 

L 

G m -i 

bm 

B m 

Am 

9 

g 2 

fa 

10 

0.003100 

15 

g 2 

b3 

7 

0.024800 

9 

g 3 

fa 

9 

0.004020 

15 

g 3 

b 4 

6 

0.030150 

9 

g 4 

fa 

9 

0.003360 

15 

g 4 

be 

6 

0.025200 

9 

g 5 

h 

9 

0.002900 

15 

G 5 

be 

6 

0.021750 

9 

Ge 

h 

9 

0.002640 

15 

Ge 

b 7 

5 

0.036960 

10 

g 2 

h 

9 

0.006200 

16 

g 2 

h 

6 

0.046500 

10 

g 3 

b 4 

9 

0.004020 

16 

g 3 

b 4 

6 

0.030150 

10 

G 4 

h 

8 

0.006720 

16 

g 4 

be 

6 

0.025200 

10 

g 5 

be 

8 

0.005800 

16 

G 5 

be 

5 

0.040600 

10 

g 6 

b 7 

8 

0.005280 

16 

Ge 

b 7 

5 

0.036960 

11 

g 2 

i>3 

8 

0.012400 

17 

g 2 

b 3 

6 

0.046500 

11 

g 3 

h 

8 

0.008040 

17 

g 3 

b 4 

6 

0.030150 

11 

g 4 

b5 

8 

0.006720 

17 

g 4 

be 

5 

0.047040 

11 

g 5 

be 

7 

0.011600 

17 

G 5 

be 

5 

0.040600 

11 

G 6 

bi 

7 

0.010560 

17 

G 6 

b 7 

5 

0.036960 

12 

g 2 

h 

8 

0.012400 

18 

g 2 

h 

6 

0.046500 

12 

g 3 

b 4 

7 

0.016080 

18 

g 3 

b 4 

5 

0.056280 

12 

g 4 

be 

7 

0.013440 

18 

g 4 

k 

5 

0.047040 

12 

g 5 

be 

7 

0.011600 

18 

G 5 

be 

5 

0.040600 

12 

Gq 

h 

7 

0.010560 

18 

G 6 

b 7 

5 

0.036960 

13 

g 2 

h 

7 

0.024800 

19 

g 2 

fa 

6 

0.046500 

13 

g 3 

b 4 

7 

0.016080 

19 

g 3 

b 4 

5 

0.056280 

13 

g 4 

b 5 

7 

0.013440 

19 

g 4 

fa 

5 

0.047040 

13 

g 5 

be 

6 

0.021750 

19 

G 5 

fa 

4 

0.058000 

13 

ft 

h 

6 

0.019800 

19 

g 6 

h 

4 

0.052800 

14 

ft 

h 

7 

0.024800 

20 


k 

6 

0.046500 

14 

G 3 

b 4 

6 

0.030150 

20 

g 3 

fa 

5 

0.056280 

14 

g 4 

be 

6 

0.025200 

20 

g 4 

A 

5 

0.047040 

14 

Go 

k 

6 

0.021750 

20 

Go 

k 

4 

0.058000 

14 

Gs 

b 7 

6 

0.019800 

20 

ft 

Jl 

4 

0.052800 
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L 

G m _1 

bm 

Bm 


L 

Gm —1 


mm 

A m 

21 

G 2 

k 

5 

0.086800 

27 

g 2 

bz 

5 

0.086800 

21 

g 3 

k 

5 

0.056280 

27 

g 3 

k 

4 

0.080400 

21 

G 4 

k 

5 

0.047040 

27 

g 4 

k 

4 

0.067200 

21 

g 5 

k 

4 

0.058000 

27 

g 5 

k 

3 

0.094250 

21 

g 6 

b 7 

4 

0.052800 

27 

Gs 

k 

3 

0.085800 

22 

g 2 

bj, 

5 

0.086800 

28 

g 2 

k 

4 

0.124000 

22 

g 3 

b 4 

5 

0.056280 

28 

g 3 

k 

4 

0.080400 

22 

g 4 

b 5 

4 

0.067200 

28 

g 4 

k 

4 

0.067200 

22 

Gs 

be 

4 

0.058000 

28 

G 5 

k 

3 

0.094250 

22 

Ge 

67 

4 

0.052800 

28 

Ge 

k 

3 

0.085800 

23 

g 2 

bz 

5 

0.086800 

29 

g 2 

k 

4 

0.124000 

23 

g 3 

k 

4 

0.080400 

29 

g 3 

k 

4 

0.080400 

23 

g 4 

k 

4 

0.067200 

29 

g 4 

k 

4 

0.067200 

23 

Gs 

k 

4 

0.058000 

29 

G 5 

k 

3 

0.094250 

23 

g 6 

k 

4 

0.052800 

29 

Ge 

67 

3 

0.085800 

24 

g 2 

h 

5 

0.086800 

30 

g 2 

k 

4 

0.124000 

24 

g 3 

k 

4 

0.080400 

30 

g 3 

k 

4 

0.080400 

24 

g 4 

k 

4 

0.067200 

30 

g 4 

k 

4 

0.067200 

24 

g 5 

k 

4 

0.058000 

30 

Gs 

k 

3 

0.094250 

24 

Gs 

67 

4 

0.052800 

30 

Gs 

k 

3 

0.085800 

25 

g 2 

^ 3 

5 

0.086800 

31 

g 2 

k 

4 

0.124000 

25 

g 3 

k 

4 

0.080400 

31 

g 3 

k 

4 

0.080400 

25 

g 4 

k 

4 

0.067200 

31 

g 4 

h 

3 

0.109200 

25 

Go 

k 

4 

0.058000 

31 

Gs 

k 

3 

0.094250 

25 

g 6 

k 

3 

0.085800 

31 

G 6 

k 

3 

0.085800 

26 

g 2 

bz 

5 

0.086800 

32 

g 2 

k 

4 

0.124000 

26 

g 3 

k 

4 

0.080400 

32 

g 3 

k 

4 

0.080400 

26 

g 4 

k 

4 

0.067200 

32 

g 4 

h 

3 

0.109200 

26 


h 

3 

0.094250 

32 

65 

k 

3 

0.094250 

26 

^6 

k 

3 

0.085800 

32 

Ge 

bj 

3 

0.085800 


68 




TIA/EIA/IS-69.5 


L 

G-m— i 

bm 

Bm 

^ m 

L 

Gm—l 

bm 

Bm 

2^771 

33 

g 2 

k 

4 

0.124000 

39 

g 2 

m 

4 

0.124000 

33 

63 

64 

3 

0.130650 

39 

g 3 

bi 

3 

0.130650 

33 

G 4 

bo 

3 

0.109200 

39 

g 4 

~b 5 

3 

0.109200 

33 

65 

k 

3 

0.094250 

39 

g 5 

k 

3 

0.094250 

33 

g 6 

67 

3 

0.085800 

39 

Ge 

k 

2 

0.112200 

34 

g 2 

k 

4 

0.124000 

40 

g 2 

k 

4 

0.124000 

34 

g 3 

k 

3 

0.130650 

40 

g 3 

k 

3 

0.130650 

34 

G 4 

k 

3 

0.109200 

40 

g 4 

bs 

3 

0.109200 

34 

G 5 

k 

3 

0.094250 

40 

g 5 

be 

3 

0.094250 

34 

g 6 

67 

3 

0.085800 

40 

Ge 

k 

2 

0.112200 

35 

g 2 

k 

4 

0.124000 

41 

g 2 

k 

4 

0.124000 

35 

Gz 

64 

3 

0.130650 

41 

Gs 

k 

3 

0.130650 

35 

Ga 

k 

3 

0.109200 

41 

g 4 

k 

3 

0.109200 

35 

g 5 

k 

3 

0.094250 

41 

Go 

k 

2 

0.123250 

35 

Ge 

b 7 

3 

0.085800 

41 

Ge 

k 

2 

0.112200 

36 

g 2 

k 

4 

0.124000 

42 

g 2 

k 

4 

0.124000 

36 

Gz 

64 

3 

0.130650 

42 

Gs 

k 

3 

0.130650 

36 

Ga 

k 

3 

0.109200 

42 

g 4 

k 

3 

0.109200 

36 

g 5 

be 

3 

0.094250 

42 

g 5 

k 

2 

0.123250 

36 

g 6 

k 

3 

0.085800 

42 

g 6 

k 

2 

0.112200 

37 

G 2 

k 

4 

0.124000 

43 

g 2 

k 

4 

0.124000 

37 

g 3 

64 

3 

0.130650 

43 

g 3 

k 

3 

0.130650 

37 

64 

k 

3 

0.109200 

43 

g 4 

k 

3 

0.109200 

37 

Gs 

k 

3 

0.094250 

43 

g 5 

be 

2 

0.123250 

37 

Ge 

k 

2 

0.112200 

43 

Ge 

k 

2 

0.112200 

38 

G 2 

k 
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Annex G: Bit Allocation for Higher Order DCT Coefficients 
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Annex I: Speech Synthesis Window 
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Annex J: Log Magnitude Prediction Residual Block Lengths 
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Annex K: Flow Charts 
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Flow Chart 2: Initial Pitch Estimation 
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Flow Chart 3: Look-Back Pitch Tracking 
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(b) (c) (d) 

Flow Chart 4: Look-Ahead Pitch Tracking (1 of 3) 
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(e) 

Flow Chart 4: Look-Ahead Pitch Tracking (2 of 3) 
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Flow Chart 4: Look-Ahead Pitch Tracking (3 of 3) 


101 








TIA/EIA/IS-69.5 



(a) (b) 


Flow Chart 5: V/UV Determination (lof2) 
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Flow Chart 5: V/UV Determination (2 of 2) 
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(a) (b) 


Flow Chart 6: Unvoiced Speech Synthesis (1 of 2) 
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Flow Chart 6: Unvoiced Speech Synthesis (2 of 2) 
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Flow Chart 7: Voiced Speech Synthesis (1 of 2) 
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Flow Chart 7: Voiced Speech Synthesis (2 of 2) 
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Flow Chart 8: Spectral Amplitude Enhancement 
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( a ) (b) (c) 


Flow Chart 9: Adaptive Smoothing (1 of 2) 
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(a) (b) (c) 



Flow Chart 9: Adaptive Smoothing (2 of 2) 
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(a) 


Flow Chart 10: Encoder Bit Manipulations (1 of 2) 
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(a) 



Flow Chart 10: Encoder Bit Manipulations (2 of 2) 
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(a) 

Flow Chart 11: Decoder Bit Manipulations (1 of 2) 
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(a) 





Flow Chart 11: Decoder Bit Manipulations (2 of 2) 
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Flow Chart 12: Pitch Refinement (1 of 2) 
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Flow Chart 12: Pitch Refinement (2 of 2) 
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